# 3.2: Gathering Data

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\avec}{\mathbf a}$$ $$\newcommand{\bvec}{\mathbf b}$$ $$\newcommand{\cvec}{\mathbf c}$$ $$\newcommand{\dvec}{\mathbf d}$$ $$\newcommand{\dtil}{\widetilde{\mathbf d}}$$ $$\newcommand{\evec}{\mathbf e}$$ $$\newcommand{\fvec}{\mathbf f}$$ $$\newcommand{\nvec}{\mathbf n}$$ $$\newcommand{\pvec}{\mathbf p}$$ $$\newcommand{\qvec}{\mathbf q}$$ $$\newcommand{\svec}{\mathbf s}$$ $$\newcommand{\tvec}{\mathbf t}$$ $$\newcommand{\uvec}{\mathbf u}$$ $$\newcommand{\vvec}{\mathbf v}$$ $$\newcommand{\wvec}{\mathbf w}$$ $$\newcommand{\xvec}{\mathbf x}$$ $$\newcommand{\yvec}{\mathbf y}$$ $$\newcommand{\zvec}{\mathbf z}$$ $$\newcommand{\rvec}{\mathbf r}$$ $$\newcommand{\mvec}{\mathbf m}$$ $$\newcommand{\zerovec}{\mathbf 0}$$ $$\newcommand{\onevec}{\mathbf 1}$$ $$\newcommand{\real}{\mathbb R}$$ $$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$$ $$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$$ $$\newcommand{\bcal}{\cal B}$$ $$\newcommand{\ccal}{\cal C}$$ $$\newcommand{\scal}{\cal S}$$ $$\newcommand{\wcal}{\cal W}$$ $$\newcommand{\ecal}{\cal E}$$ $$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$$ $$\newcommand{\gray}[1]{\color{gray}{#1}}$$ $$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$$ $$\newcommand{\rank}{\operatorname{rank}}$$ $$\newcommand{\row}{\text{Row}}$$ $$\newcommand{\col}{\text{Col}}$$ $$\renewcommand{\row}{\text{Row}}$$ $$\newcommand{\nul}{\text{Nul}}$$ $$\newcommand{\var}{\text{Var}}$$ $$\newcommand{\corr}{\text{corr}}$$ $$\newcommand{\len}[1]{\left|#1\right|}$$ $$\newcommand{\bbar}{\overline{\bvec}}$$ $$\newcommand{\bhat}{\widehat{\bvec}}$$ $$\newcommand{\bperp}{\bvec^\perp}$$ $$\newcommand{\xhat}{\widehat{\xvec}}$$ $$\newcommand{\vhat}{\widehat{\vvec}}$$ $$\newcommand{\uhat}{\widehat{\uvec}}$$ $$\newcommand{\what}{\widehat{\wvec}}$$ $$\newcommand{\Sighat}{\widehat{\Sigma}}$$ $$\newcommand{\lt}{<}$$ $$\newcommand{\gt}{>}$$ $$\newcommand{\amp}{&}$$ $$\definecolor{fillinmathshade}{gray}{0.9}$$

## Introduction to Sampling

Suppose you were chosen to help pick out a theme for your school prom. Out of all of the initial suggestions offered by your team, you have narrowed the options down to 3: Famous Couples through the Ages, Romance Under the Sea, and Stairway to Heaven.

Joe Shlabotnik - https://www.flickr.com/photos/joeshlabotnik/2207112346

Since this is the Senior Prom, you feel that the Senior Class should make the final call. Unfortunately, there are over three hundred seniors in your school, and your deadline for a decision is in one hour! How could you get a good idea of the preference of the class as a whole in such a limited time?

By the end of this lesson, you should have no problem suggesting a good solution!

## Introduction to Sampling

There are many situations in life where we need to gather data on a very large or difficult to study population. Certainly it is ideal in most cases to be able to individually poll each and every member, but sometimes that just isn’t feasible.

In such cases, the solution is to use a sample or subset that is carefully picked to accurately represent the full population. An experiment conducted on a well-chosen sample should provide an accurate representation of the results you would get by performing the same experiment on the population from which the sample was created.

There are many different ways to choose a sample, and all have applications for which they are more or less appropriate.

A few examples of sampling methods:

• Random Sampling (choosing representatives by rolling a die, for instance)
• Stratified Sampling (choosing a proportional number of representatives from each of a number of subgroups of the initial population) These divisions are chosen based on the belief that the subgroups differ significantly with respect to the variable that you are measuring. For example you might stratify by age or by income.
• Cluster Sampling (choosing representatives which are close to other representatives based on a particular factor such as location, age, color, size, etc.)
• Multi-Stage Sampling (narrowing down a field of representatives by successively applying multiple different sampling methods) For example you might stratify and then take a simple random sample from each stratum.

### Understanding When to Use Sample Groups

Would it be necessary to use a sample group to evaluate the effects of too much sugar on a group of 15 elementary-school children? What about a playground full of 300 children?

15 children certainly seems like a manageable size group for study, so choosing a sample to represent the whole group is probably not necessary from that standpoint. However, this is the type of study where a control group would be an important consideration. If you just gave an extra handful of candy to every child, you would not know how much of the later energy actually came from the sugar, and how much was just a result of age. By pulling aside a control group of perhaps 6 students who would not get the extra sugar, you could better evaluate the difference in energy actually due to diet rather than age.

With 300 children all running around a playground, collecting them all together and attempting to organize a study might prove a daunting task. If you just chose a sample of perhaps 30 of them, some a little older, some younger, some boys, some girls, you could get an estimate of what would happen if you applied the study to the entire group.

### Choosing the Appropriate Sampling Method

Suppose you wanted to study the effect of rubbing marbles with candle wax before playing a classic game of marbles. After setting aside a control group, you are ready to choose a sample set of marbles to rub with the wax. Would a stratified sampling of the remaining marbles be a good choice in this situation?

Probably not. Marbles are generally created to be as alike as possible in every way other than appearance, and since appearance is unlikely to have an effect on the result of the wax experiment, it would not make sense to carefully attempt to represent each color or type of decoration. A random sample would be simpler and would very likely yield the same results.

### Recognizing Sampling Errors

The student council at your school has been given an assignment to find a good use for a grant that the school received to make school more enjoyable for the students. After a week or two of deliberation, the council announces that the studies they have conducted suggest that providing the cheerleading squad with new pom-poms is the #1 priority of 90% of the students in the school. Of course, the chess club members disagree and conduct their own study. If the chess team chooses a sample the same way the student council did, and their results suggest that 90% of respondents think that the money should go toward new chess clocks, what error do you think both groups committed in the choice of sample groups for study?

It would certainly appear that both groups were guilty of a process called ‘cherry-picking’, which means that they deliberately chose to question people who shared the same interests in order to get favorable results from their polls. Obviously neither group’s results are likely to be representative of the entire student body, but rather only represent the views of the chess team and the cheerleading squad!

### Earlier Problem Revisited

Suppose you were chosen to help pick out a theme for your school prom. Out of all of the initial suggestions offered by your team, you have narrowed the options down to 3: Famous Couples through the Ages, Romance Under the Sea, and Stairway to Heaven.

Since this is the Senior Prom, you feel that the Senior Class should make the final call. Unfortunately there are over three hundred seniors in your school, and your deadline for a decision is in one hour! How could you get a good idea of the preference of the class as a whole in such a limited time?

This is an excellent case of the need for a representative sample of a population. Without having the time to poll all of the members of the senior class, you could get an idea of what the most popular theme would be by choosing a smaller number of seniors to represent the entire class. Just be careful to minimize the chance that your chosen representatives have any sort of bias that might keep them from properly representing the class as a whole.

## Examples

### Example 1

What kind of sampling would you expect was used if the sample group was composed of 5 yellow, 3 green, 4 red, and 6 blue members, and the population included 48 blue, 32 red, 24 green, and 40 yellow members?

Since the sample group contains exactly 1/8 as many members of each color as the entire population, it is reasonable to suspect that a stratified sampling was used.

### Example 2

What type(s) of sampling method(s) might be most appropriate for approximating the number of cutthroat trout in a 25-mile section of river?

A 25-mile-long section of river is likely to include a number of different types of ecosystems that each would harbor a different density of fish. In order to get a good sample, a multi-stage sampling method comprised of a stratified sample of different ecosystems followed by a random sampling of fish in each ecosystem would probably be a good choice.

### Example 3

Would you reasonably expect bias to have affected a sample composed of 75% Toyota vehicles in a study of the most common cars in large U.S. cities?

Although Toyota is a very popular vehicle manufacturer, 75% is an extremely high percentage of vehicles in a large city (reasonable estimates put Toyota somewhere between 25 and 30 percent). Such a huge number would definitely suggest sample bias.

### Example 4

Would a random sampling of students be the most appropriate method of sampling for a study of the most enjoyable after-school club in a large public school?

Probably not, since a random sampling would likely include a large number of students who either have no opinion or have no experience with any after school clubs. More accurate results would be obtained by a multi-stage sample that first identified club members, and then randomly selected representatives from them.

### Example 5

What might you conjecture about a study that claims 100% of respondents preferred “Super Sweet and Crunchy” cereal over “Super Duper Sweet” cereal

There are a number of reasonable specific conjectures we might make, most related to inaccurate sampling methodology. Perhaps the sample was chosen from employees of the “Super Sweet and Crunchy” cereal company, perhaps respondents were offered a reward for choosing one option over the other, perhaps there was only a single member of the sample group or the “study” didn’t include milk for the other cereal, or didn’t offer samples of “Super Duper Sweet” to respondents at all

## Review

1. Margo collected 12 carrots in a bag. She drew 5 carrots out of the bag. Is this a random sample of the carrots in the bag?

2. Chris put some assorted colored kerchiefs into a box. He looks into the box and pulls out the blue kerchiefs. Is this is a random sample of the kerchiefs in the bag?

3. Sue had red and white beans in a jar. She reached in and pulled out 10 beans, without looking in the jar. Is this a random sample of beans from the jar?

For questions 4-6, identify the population and the sample from each:

For example: In a class of 20 students, where each student is asked if they have gone to the movies in the past month, you would identify the population as 20 Students, and the sample as 20 students.

4. People aboard a plane who have aisle seats are asked if they travel more than 5000 miles per year.

a. Population:

b. Sample:

5. A team of marketing specialists survey every sixth child entering a park to find out how many rides they plan to go on while playing in the park.

a. Population:

b. Sample:

6. Every 15th adult at the exit door of the grocery store is questioned to find out if the store should increase its hours of operation.

a. Population:

b. Sample:

7. Luke wants to find out where most high school students buy their food for lunch. He surveys every fourth student he sees in the high school parking lot and asks them where they get food for lunch. Which would have been an improvement in Luke’s experiment?

a. Survey all of the students in the school.

b. Survey all people in the parking lot.

c. Survey students in the lunch hall.

8. Sue is trying to determine the best location to sell snow cones. There are 4 locations in the city (on a side street, downtown, near a park and at a school. Sue observed that many people visit the downtown area and the park. Sue decided to sell snow cones in the downtown area where she saw the most people gather. What changes to Sue’s sample would have given her a better understanding of where to sell snow cones?

9. Kerry collected shells from a visit to the ocean in a shoebox. She takes out a handful of shells from the box. Is this a random sample of shells in a box?

10. There are four dentists in a city. Their offices are located in four different parts of the city. Jake wants to attempt to figure out which dentist has the most patients. He observed that the Downtown and West Street areas have larger populations. He concurred that the dentists in those areas must have more patients. After comparing those two areas, he decided that the West Street dentist had the most patients because the area had more traffic. What changes to Jakes technique would have given him a better understanding of which doctor had the most patients?

11. Caroline wants to predict which restaurant will have less business during the Christmas season. There are three restaurants in the city. Two are on the outskirts of a city and one is in the city. She knows that two hotels situated on the outskirts are fully booked because one has Christmas show and one has a huge indoor pool. From this information she inferred that the restaurant in the city will have less business during the Christmas season. What could Caroline due to improve her experiment?

a. Ask people at the hotels if they like fast food.

b. Survey all people to see which December holiday they celebrate.

c. Look at the past holiday performance of the restaurants.

The table gives information about the number of girls in each of four schools.

 School A B C D Total Number of Girls 126 82 201 52 461

12. Jenny did a survey of these girls. She used a stratified sample of exactly 80 girls according to school. Calculate the number of girls from each school that were in her sample of 80. Complete the table.

 School A B C D Total Number of Girls 80

## Vocabulary

Term Definition
bias Bias refers to a desire to achieve a specific result from a particular study, regardless of the data.
cluster sampling Cluster Sampling involves choosing representatives which are close to other representatives based on a particular factor such as location, age, color, size, etc.
control group A control group is a set of members deliberately kept as separate as possible from a particular study so as to provide an example of how the members should appear if unchanged.
multi-stage sampling Multi-Stage Sampling involves narrowing down a field of representatives by successively applying multiple different sampling methods.
poll To poll the members of a group means to question them regarding a specific topic.
Population In statistics, the population is the entire group of interest from which the sample is drawn.
random sampling Random Sampling involves choosing representatives by rolling a die, for instance.
stratified sampling Stratified Sampling involves choosing a proportional number of representatives from each of a number of subgroups of the initial population.