Skip to main content
K12 LibreTexts

1.5.3: Displaying Univariate Data

  • Page ID
    5748
  • Graphs for Univariate Data 

    Univariate Data is composed of single numerical variables.

    Dot Plots

    dot plot is one of the simplest ways to represent numerical data. After choosing an appropriate scale on the axes, each data point is plotted as a single dot. Multiple points at the same value are stacked on top of each other using equal spacing to help convey the shape and center.

    Constructing a Dot Plot

    The following is a data set representing the percentage of paper packaging manufactured from recycled materials for a select group of countries.

    Percentage of the paper packaging used in a country that is recycled. Source: National Geographic, January 2008. Volume 213 No.1, pg 86-87.
    Country % of Paper Packaging Recycled
    Estonia 34
    New Zealand 40
    Poland 40
    Cyprus 42
    Portugal 56
    United States 59
    Italy 62
    Spain 63
    Australia 66
    Greece 70
    Finland 70
    Ireland 70
    Netherlands 70
    Sweden 70
    France 76
    Germany 83
    Austria 83
    Belgium 83
    Japan 98

    The dot plot for this data would look like this:

    Screen Shot 2020-04-26 at 6.14.42 PM.png

    Notice that this data set is centered at a manufacturing rate for using recycled materials of between 65 and 70 percent. It is spread from 34% to 98%, and appears very roughly symmetric, perhaps even slightly skewed left. Dot plots have the advantage of showing all the data points and giving a quick and easy snapshot of the shape, center, and spread. Dot plots are not much help when there is little repetition in the data. They can also be very tedious if you are creating them by hand with large data sets, though computer software can make quick and easy work of creating dot plots from such data sets.

    Stem-and-Leaf Plots

    One of the shortcomings of dot plots is that they do not show the actual values of the data. You have to read or infer them from the graph. From the previous example, you might have been able to guess that the lowest value is 34%, but you would have to look in the data table itself to know for sure. A stem-and-leaf plot is a similar plot in which it is much easier to read the actual data values. In a stem-and-leaf plot, each data value is represented by two digits: the stem and the leaf. In this example, it makes sense to use the ten's digits for the stems and the one's digits for the leaves. The stems are on the left of a dividing line as follows:

    Screen Shot 2020-04-26 at 6.15.31 PM.png

    Once the stems are decided, the leaves representing the one's digits are listed in numerical order from left to right:

    Screen Shot 2020-04-26 at 6.15.35 PM.png

    It is important to explain the meaning of the data in the plot for someone who is viewing it without seeing the original data. For example, you could place the following sentence at the bottom of the chart:

    Note: 5|69 means 56% and 59% are the two values in the 50's.

    If you could rotate this plot on its side, you would see the similarities with the dot plot. The general shape and center of the plot is easily found, and we know exactly what each point represents. This plot also shows the slight skewing to the left that we suspected from the dot plot. Stem plots can be difficult to create, depending on the numerical qualities and the spread of the data. If the data values contain more than two digits, you will need to remove some of the information by rounding. A data set that has large gaps between values can also make the stem plot hard to create and less useful when interpreting the data.

    Screen Shot 2020-04-26 at 6.16.47 PM.png

    Creating a Stem-and-Leaf Plot 

    Consider the following populations of counties in California.

    Butte - 220,748

    Calaveras - 45,987

    Del Norte - 29,547

    Fresno - 942,298

    Humboldt - 132,755

    Imperial - 179,254

    San Francisco - 845,999

    Santa Barbara - 431,312

    To construct a stem and leaf plot, we need to first make sure each piece of data has the same number of digits. In our data, we will add a 0 at the beginning of our 5 digit data points so that all data points have six digits. Then, we can either round or truncate all data points to two digits.

    Value Value Rounded Value Truncated
    149 15 14
    657 66 65
    188 19 18

    2|2 represents 220,000−229,999 when data has been truncated

    2|2 represents 215,000−224,999 when data has been rounded.

    If we decide to round the above data, we have:

    Butte - 220,000

    Calaveras - 050,000

    Del Norte - 030,000

    Fresno - 940,000

    Humboldt - 130,000

    Imperial - 180,000

    San Francisco - 850,000

    Santa Barbara - 430,000

    And the stem and leaf will be as follows:

    where:

    2|2 represents 215,000−224,999.

    Source: California State Association of Counties 

    Back-to-Back Stem Plots

    Stem plots can also be a useful tool for comparing two distributions when placed next to each other. These are commonly called back-to-back stem plots.

    Constructing a Back-To-Back Stem Plot

    In a previous example, we looked at recycling in paper packaging. Here are the same countries and their percentages of recycled material used to manufacture glass packaging:

    Percentage of the glass packaging used in a country that is recycled. Source: National Geographic, January 2008. Volume 213 No.1, pg 86-87.
    Country % of Glass Packaging Recycled
    Cyprus 4
    United States 21
    Poland 27
    Greece 34
    Portugal 39
    Spain 41
    Australia 44
    Ireland 56
    Italy 56
    Finland 56
    France 59
    Estonia 64
    New Zealand 72
    Netherlands 76
    Germany 81
    Austria 86
    Japan 96
    Belgium 98
    Sweden 100

    In a back-to-back stem plot, one of the distributions simply works off the left side of the stems. In this case, the spread of the glass distribution is wider, so we will have to add a few extra stems. Even if there are no data values in a stem, you must include it to preserve the spacing, or you will not get an accurate picture of the shape and spread.

    Screen Shot 2020-04-26 at 6.17.34 PM.png

    We have already mentioned that the spread was larger in the glass distribution, and it is easy to see this in the comparison plot. You can also see that the glass distribution is more symmetric and is centered lower (around the mid-50's), which seems to indicate that overall, these countries manufacture a smaller percentage of glass from recycled material than they do paper. It is interesting to note in this data set that Sweden actually imports glass from other countries for recycling, so its effective percentage is actually more than 100.


    Examples

    The following examples uses the data set below. 

    Here are the ages, arranged order, for the CEOs of the 60 top-ranked small companies in America in 1993:

    32, 33, 36, 37, 38, 40, 41, 43, 43, 44, 44, 45, 45, 45, 45,46, 46, 47, 47, 47, 48, 48, 48, 48, 49, 50, 50, 50, 50, 50, 50, 51, 51, 52, 53, 53, 53, 55, 55, 55, 56, 56, 56, 56, 57, 57, 58, 58, 59, 60, 61, 61, 61, 62, 62, 63, 69, 69, 70, 74

    Example 1

    Create a stem-and-leaf plot for these ages,

    Here is the stem-and-leaf plot:

    Screen Shot 2020-04-26 at 6.34.37 PM.png

    Example 2

    Create a dot plot for these ages.

     Here is the dot plot:


    Screen Shot 2020-04-26 at 6.34.42 PM.png

    Example 3

    Describe the shape of this data set. 

    The data set is approximately symmetric with most CEOs in their fifties.

    Example 4 

    Are there any outliers in this data set? 

    There do not appear to be any outliers.


    Review 

    For 1-4, the following table gives the percentages of municipal waste recycled by state in the United States, including the District of Columbia, in 1998. Data was not available for Idaho or Texas.

    State Percentage
    Alabama 23
    Alaska 7
    Arizona 18
    Arkansas 36
    California 30
    Colorado 18
    Connecticut 23
    Delaware 31
    District of Columbia 8
    Florida 40
    Georgia 33
    Hawaii 25
    Illinois 28
    Indiana 23
    Iowa 32
    Kansas 11
    Kentucky 28
    Louisiana 14
    Maine 41
    Maryland 29
    Massachusetts 33
    Michigan 25
    Minnesota 42
    Mississippi 13
    Missouri 33
    Montana 5
    Nebraska 27
    Nevada 15
    New Hampshire 25
    New Jersey 45
    New Mexico 12
    New York 39
    North Carolina 26
    North Dakota 21
    Ohio 19
    Oklahoma 12
    Oregon 28
    Pennsylvania 26
    Rhode Island 23
    South Carolina 34
    South Dakota 42
    Tennessee 40
    Utah 19
    Vermont 30
    Virginia 35
    Washington 48
    West Virginia 20
    Wisconsin 36
    Wyoming 5

    Source: Zero Waste America 

    1. Create a dot plot for this data.
    2. Discuss the shape, center, and spread of this distribution.
    3. Create a stem-and-leaf plot for the data.
    4. Use your stem-and-leaf plot to find the median percentage for this data.

    For 5-8, identify the important features of the shape of the distribution.

    Screen Shot 2020-04-26 at 6.35.41 PM.png

    For 9-12, refer to the following dot plots:

    Screen Shot 2020-04-26 at 6.37.40 PM.png

    1. Identify the overall shape of each distribution.
    2. How would you characterize the center(s) of these distributions?
    3. Which of these distributions has the smallest standard deviation?
    4. Which of these distributions has the largest standard deviation?
    1. What characteristics of a data set make it easier or harder to represent using dot plots, stem-and-leaf plots, or histograms?
    2. Here are the ages, arranged order, for the CEOs of the 60 top-ranked small companies in America in 1993 http://lib.stat.cmu.edu/DASL/Datafiles/ceodat.html32, 33, 36, 37, 38, 40, 41, 43, 43, 44, 44, 45, 45, 45, 45,46, 46, 47, 47, 47, 48, 48, 48, 48, 49, 50, 50, 50, 50, 50, 50, 51, 51, 52, 53, 53, 53, 55, 55, 55, 56, 56, 56, 56, 57, 57, 58, 58, 59, 60, 61, 61, 61, 62, 62, 63, 69, 69, 70, 74
      1. Create a stem-and-leaf plot for these ages.
      2. Create a dot plot for these ages.
      3. Describe the shape of this dataset.
      4. Are there any outliers in this dataset?
    3. Give an example in which the same measurement taken on the same individual would be considered to be an outlier in one dataset but not in another dataset.
    4. Does a stem and leaf plot provide enough information to determine if there are any outliers in the dataset? Explain.
    5. Does a five number summary provide enough information to determine if there are any outliers in the data set? Explain.
    6. A set of 17 exam scores is 67, 94, 88, 76, 85, 93, 55, 87, 80, 81, 80, 61, 90 ,84, 75, 93, 75
      1. Draw a stem-and-leaf plot of the scores.
      2. Draw a dotplot of the scores.
    7. Make a stem and leaf plot of the mean high temperature in December (Farenheit) in 15 cities in California. The “stem” gives the first digit of a temperature, while the “leaf” gives the second digit. You can find the data at: http://countrystudies.us/united-states/weather/California/beverly-hills.htm
      1. Describe the shape of the dataset. Is it skewed or is it symmetric?
      2. What is the highest temperature in the dataset?
      3. What is the lowest temperature in the dataset?
      4. What percent of the 15 cities have a mean high December temperature in the 60s?

    Review (Answers) 

    To view the Review answers, open this PDF file and look for section 2.3. 


    Additional Resources

    PLIX: play, learn, interact, and eXplore for Ordering Leaves

    Practice for Displaying Univariate Data