9.11: Contingency Tables
- Page ID
- 5793
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Contingency Tables
Suppose you wanted to evaluate how gender affects the type of movie chosen by movie-goers, how might you organize data on Male and Female watchers, and Action, Romance, Comedy, and Horror movie types, so it would be easy to compare different combinations?
See the end of the lesson where this question is reviewed.
OpenClips - pixabay.com/en/film-reel-cinema-film-movie-reel-147631/?oq=film
Contingency Tables
Contingency tables are used to evaluate the interaction of statistics from two different categorical variables. They are often used to organize data from different random variables in preparation for a contingency test (which we will be discussing further in the next lesson).
Contingency tables are sometimes called two-way tables because they are organized with the outputs of one variable across the top, and another down the side. Consider the table below:
Male |
Female |
|
Chocolate Candy |
42 |
77 |
Fruit Candy |
58 |
23 |
This is a contingency table comparing the variable ‘Gender’ with the variable ‘Candy Preference’. You can see that, across the top of the table are the two gender options for this particular study: ‘male students’ and ‘female students’. Down the left side are the two candy preference options: ‘chocolate’ and ‘fruit’. The data in the center of the table indicates the reported candy preferences of the 100 students polled during the study.
Commonly, there will be one additional row and column for totals, like this:
Male |
Female |
TOTAL |
|
Chocolate Candy |
42 |
77 |
119 |
Fruit Candy |
58 |
23 |
81 |
TOTAL |
100 |
100 |
200 |
Notice that you can run a quick check on the calculation of totals, since the “total of totals” should be the same from either direction: 119+81=200=100+100.
The benefits of a contingency square will be apparent the more you use it. As you begin to evaluate different bits of information, each combination of variable outputs is easily noted.
Constructing a Contingency Table
Construct a contingency table to display the following data: “250 mall shoppers were asked if they intended to eat at the in-mall food court or go elsewhere for lunch. Of the 117 male shoppers, 68 intended to stay, compared to only 62 of the 133 female shoppers”.
First, let’s identify our variables and set up the table with the appropriate row and column headers.
The variables are gender and lunch location choice:
Male |
Female |
TOTAL |
|
Food Court |
|||
Out of Mall |
|||
TOTAL |
Now we can fill in the values we have directly from the text:
Male |
Female |
TOTAL |
|
Food Court |
68 |
62 |
|
Out of Mall |
|||
TOTAL |
117 |
133 |
250 |
Now we can fill in the missing data with simple addition/subtraction:
Male |
Female |
TOTAL |
|
Food Court |
68 |
62 |
130 |
Out of Mall |
49 |
71 |
120 |
TOTAL |
117 |
133 |
250 |
Answering Questions
Referencing data from the previous example, answer the following:
a. What percentage of food-court eaters are female?
If we read across the row “Food Court”, we see that there were a total of 130 shoppers eating “in”, and that 62 of them were female. To calculate percentage, we simply divide: 62/130≈.477 or 47.7%.
b. What is the distribution of male lunch-eaters?
The male shoppers were distributed as 68 food court and 49 out of mall.
c. What is the marginal distribution of the variable "lunch location preference?
The marginal distribution is the distribution of data “in the margin”, or in the TOTAL column. In this case, we are interested in the data on lunch location preference, which is found in the far right column: 130 food court and 120 out of mall.
d. What is the marginal distribution of the variable "Gender"?
The marginal distribution of gender can be found in the bottom row: 117 males and 133 females.
e. What percentage of females prefer to eat out?
Here we are interested in data from the females, so we will be dealing with the ‘female’ column. From the data in the column, we see that 71 of the 133 females preferred to eat out. This is a percentage of: 71/133≈.534 or 53.4%.
Identifying Marginal Distributions and Making Observations
“Out of 213 polled amateur drag racers, 47 drove cars with turbo-chargers, 59 had superchargers, and the rest were normally aspirated. The racers themselves were split between 102 rookies and 111 veterans. The rookies evidently preferred turbos, since 29 of them had turbo-charged vehicles, and avoided superchargers, since there were only 12 of them”.
StooMathiesen - https://www.flickr.com/photos/stoo57/5773404346
a. Construct a contingency table:
Set up the table with the appropriate headers, and fill in the data we know. Note that this time we will need a 3×2 table instead of a 2×2 (it is still a two-way table though, as there are only two variables: engine aspiration and driver experience):
Turbocharger |
Supercharger |
Normal Aspiration |
TOTAL |
|
Rookie |
29 |
12 |
102 |
|
Veteran |
111 |
|||
TOTAL |
37 |
59 |
117 |
213 |
Now we can update the table with the missing data, calculated using addition or subtraction:
Turbocharger |
Supercharger |
Normal Aspiration |
TOTAL |
|
Rookie |
29 |
12 |
61 |
102 |
Veteran |
8 |
47 |
56 |
111 |
TOTAL |
37 |
59 |
117 |
213 |
b. Identify the marginal distributions
The marginal data refers to the overall data for each of the two variables:
- Aspiration type is distributed as follows: 37 Turbos, 59 Superchargers, and 117 normally aspirated.
- Driver experience distribution: 102 Rookies and 111 Veterans.
c. Identify 3 different percentage-based observations
Three percentage-based observations:
- 61/102=0.598 or 59.8% of Rookies drive normally aspirated cars.
- 47/59=0.7966 or 79.66% of the Superchargers were in cars driven by Veterans.
- 47/111=0.4234 or 42.34% of Veterans use Superchargers.
Earlier Problem Revisited
Suppose you wanted to evaluate how gender affects the type of movie chosen by movie-goers, how might you organize data on Male and Female watchers, and Action, Romance, Comedy, and Horror movie types, so it would be easy to compare different combinations?
A contingency table would be excellent for this purpose. By listing gender categories in one direction and movie type in the other, it would be a simple matter to evaluate different combinations of variables.
Examples
Example 1
Complete the data in the contingency table:
A | B |
TOTAL |
|
X |
47 |
||
Y |
32 |
100 |
|
TOTAL |
100 |
200 |
A |
B | TOTAL | |
X |
47 |
100−47=53 |
200−100=100 |
Y |
100−47=53 |
32 |
100 |
TOTAL |
100 |
200−100=100 |
200 |
Example 2
What is the marginal distribution of the variable consisting of categories A and B?
There variable consisting of categories A and B is distributed as A: 100 and B: 100.
Example 3
What percentage of B’s are Y’s?
There are 32 B's that are also Y's, out of the total 100 B's: 32100=32%
Example 4
What portion of A’s are X’s? Express your answer as a decimal.
47 of the 100 A’s are X’s, 47/100=0.47
Review
Questions 1-9 refer to the following table:
Sports Cars |
Pickup Trucks |
Luxury Cars |
TOTAL |
|
Male Drivers |
72 |
67 |
36 |
175 |
Female Drivers |
36 |
71 |
68 |
175 |
TOTAL |
108 |
138 |
104 |
350 |
1. What is the marginal distribution of vehicle types?
2. What is the marginal distribution of driver gender?
3. What decimal portion of male drivers have luxury cars?
4. What percentage of female drivers have pickups?
5. How many drivers were polled?
6. What is the overall most popular vehicle type, by percentage?
7. Which vehicle type has the single largest cell value, and what percentage does it represent of that gender category?
8. What percentage of pickup trucks are driven by females?
9. What percentage of females drive pickup trucks?
Questions 10-18 refer to the following data:
“One hundred eighty dogs were studied to determine if breed affected food preference. Of the 70 Huskies, 30 preferred beef flavor and 40 preferred chicken. Of the 50 Poodles, 27 preferred beef, the rest chicken. The rest of the dogs, English Mastiffs, were obviously beef-lovers, as only 19 preferred chicken over beef”.
10. Create a contingency table to display the data.
11. What is the marginal distribution of dog breeds?
12. What is the marginal distribution of food types?
13. What percentage of Mastiffs preferred beef?
14. What percentage of beef-lovers were Mastiffs?
15. What flavor/dog combination indicated the strongest preference? What percentage of the breed did it represent?
16. What is the distribution of chicken preference?
17. What is the distribution of beef preference?
18. Which breed shows the least defined preference, as a percentage?
Vocabulary
Term | Definition |
---|---|
contingency tables | A contingency table (two-way table) is used to organize data from multiple categories of two variables so that various assessments may be made. |
marginal distribution | The marginal distribution is the distribution of data “in the margin”, or in the TOTAL column. |
two way tables | Contingency tables are sometimes called two-way tables because they are organized with the outputs of one variable across the top, and another down the side. |
Additional Resources
Practice: Contingency Tables
Real World: Votes for Women