Skip to main content
K12 LibreTexts

2.8.2: Stem-and-Leaf Plots and Histograms

  • Page ID
    5775
  • Stem-and-Leaf Plots and Histograms 

    Imagine asking a class of 20 algebra students how many brothers and sisters they had. You would probably get a range of answers from zero on up. Some students would have no siblings, but most would have at least one. The results might look like this:

    1, 4, 2, 1, 0, 2, 1, 0, 1, 2, 1, 0, 0, 2, 2, 3, 1, 1, 3, 6

    We could organize this information in many ways. The first way might just be to create an ordered list, relisting all the numbers in order, starting with the smallest:

    0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 6

    Another way to list the results is in a table:

    Number of siblings Number of matching students
    0 4
    1 7
    2 5
    3 2
    4 1
    5 0
    6 1

    We could also make a visual representation of the data by making categories for the number of siblings on the x−axis, and stacking representations of each student above the category marker. We could use crosses, stick-men or even photographs of the students to show how many students are in each category.

    Screen Shot 2020-05-13 at 10.16.00 PM.png

    Make and Interpret Stem-and-Leaf Plots

    Another useful way to display data is with a stem-and-leaf plot. Stem-and-leaf plots are especially useful because they give a visual representation of how the data is clustered, but preserve all of the numerical information. A stem-and-leaf plot consists of a vertical “stem” containing the first digit of each number, with the rest of each number written to the right of the stem like a “leaf.” In the stem and leaf plot below, the first number represented is 21. It is the only number with a stem of 2, so that makes it the only number in the 20’s. The next two numbers have a common stem of 3. They are 33 and 36. The next numbers are 40, 46 and 47.

    Screen Shot 2020-05-13 at 10.16.39 PM.png

    Stem-and-leaf plots have a number of advantages over simply listing the data in a single line.

    • They show how data is distributed, and whether it is symmetric around the center.
    • They can be used as the data is being collected.
    • They make it easy to determine the median and mode.

    Stem-and-leaf plots are not ideal for all situations; in particular they are not practical when the data is too tightly clustered. For example, with the data above about students’ siblings, all the data points would occupy the same stem (zero). In that case, no additional information could be gained from a stem-and-leaf plot.

    Creating a Stem-and-Leaf Plot

    While traveling on a long train journey, Rowena collected the ages of all the passengers traveling in her carriage. The ages for the passengers are shown below. Arrange the data into a stem-and-leaf plot, and use the plot to find the median and mode ages.

    35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2, 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21

    The first step is to determine a sensible stem. Since all the values fall between 1 and 84, the stem should represent the tens column, and run from 0 to 8 so that the numbers represented can range from 00 (which we would represent by placing a leaf of 0 next to the 0 on the stem) to 89 (a leaf of 9 next to the 8 on the stem). We then go through the data and fill out our plot:

    Screen Shot 2020-05-13 at 10.17.29 PM.png

    You can see immediately that the interval with the most number of passengers is the 40-49 group. In order to correctly determine the median and the mode, it is helpful to construct a second, ordered stem and leaf plot, placing the leaves on each branch in ascending order

    Screen Shot 2020-05-13 at 10.17.56 PM.png

    The mode is now apparent—there are 4 zeros in a row on the 4-branch, so the mode is 40. The median is the middle value; since there are 43 data points, the median is the 22nd value. (Using our formula from earlier, (43+1)/2=22.) So the median is 37.

    Make and Interpret Histograms

    Look again at the example of the algebra students and their siblings. The data was collected in the following list.

    1, 4, 2, 1, 0, 2, 1, 0, 1, 2, 1, 0, 0, 2, 2, 3, 1, 1, 3, 6

    We were able to organize the data into a table. Here is the table again, but this time we will use the word frequency as a header to indicate the number of times each value occurs in the list.

    Number of siblings Frequency
    0 4
    1 7
    2 5
    3 2
    4 1
    5 0
    6 1

    Now we could use this table as an (x,y) coordinate list to plot a line diagram like this one:

    Screen Shot 2020-05-13 at 10.19.08 PM.png

    While this diagram does indeed show the data, it is somewhat misleading. For example, the continuous line joining the number of students with one and two siblings makes it look like we know something about how many students have 1.5 siblings (which of course, is impossible). In this case, where the data points are all integers, it’s wrong to suggest that the function is continuous between the points!

    When the data we are representing falls into well defined categories (such as the integers 1, 2, 3, 4, 5 & 6) it is more appropriate to use a histogram to display that data. A histogram for this data is shown below.

    Screen Shot 2020-05-13 at 10.19.38 PM.png

    Each number on the x−axis has an associated column, whose height shows how many students have that number of siblings. For example, the column at x=2 is 5 units high, indicating that there are 5 students with 2 siblings.

    The categories on the x−axis are called bins. Histograms differ from bar charts in that they don’t necessarily have fixed widths for the bins. They are also useful for displaying continuous data (data that varies continuously rather than in integer amounts). To illustrate this, here are some examples.

    Displaying Data in a Histogram

    Monthly rainfall (in millimeters) for Beaver Creek Oregon was collected over a five year period, and the data is shown below. Display the data in a histogram.

    41.1, 254.7, 91.6, 60.9, 75.6, 36.0, 16.5, 10.6, 62.2, 89.4, 124.9, 176.7, 121.6, 135.6, 141.6, 77.0, 82.8, 28.9, 6.7, 22.1, 29.9, 110.0, 179.3, 97.6, 176.8, 143.5, 129.8, 94.9, 77.0, 60.8, 60.0, 32.5, 61.7, 117.2, 194.5, 208.6, 176.8, 143.5, 129.8, 94.9, 77.0, 60.8, 20.0, 32.5, 61.7, 117.2, 194.5, 208.6, 133.1, 105.2, 92.0, 60.7, 52.8, 37.8, 14.8, 23.1, 41.3, 75.7, 134.6, 148.8

    Notice the similarity between histograms and stem-and-leaf plots. A stem-and-leaf plot resembles a histogram on its side. We could start by making a stem-and-leaf plot of our data.

    For our data above our stem would be the tens, and run from 1 to 25. Instead of rounding the decimals in the data, we truncate them, meaning we simply remove the decimal. For example, 165.7 would have a stem of 16 and a leaf of 5, and we would just leave out the seven tenths.

    Screen Shot 2020-05-13 at 10.20.33 PM.png

    By outlining the numbers on the stem and leaf plot, we can see what a histogram with a bin-width of 10 would look like. You can see that with so many bins, the histogram looks random, with no clear pattern visible. In a situation like this we need to reduce the number of bins. We will increase the bin width to 25 and collect the data in a table:

    Rainfall (mm) Frequency
    0≤x<25 7
    25≤x<50 8
    50≤x<75 9
    75≤x<100 12
    100≤x<125 6
    125≤x<150 9
    150≤x<175 0
    175≤x<200 6
    200≤x<225 2
    225≤x<250 0
    250≤x<275 1

    The histogram associated with this bin width is below.

    Screen Shot 2020-05-13 at 10.21.14 PM.png

    The pattern in the distribution is far more apparent with fewer bins. So let's look at what the histogram would look like with even fewer bins. We will combine bins by pairs to give 6 bins with a bin-width of 50. Our table and histogram now looks like this.

    Rainfall (mm) Frequency
    0≤x<50 15
    50≤x<100 21
    100≤x<150 15
    150≤x<200 6
    200≤x<250 2
    250≤x<300 1

    Screen Shot 2020-05-13 at 10.21.52 PM.png

    The pattern is much clearer now. The normal monthly rainfall is around 75 mm, but sometimes it will be a very wet month and be higher (even much higher).

    You can see that although it may be counter-intuitive, sometimes you can see more information by reducing the number of intervals (or bins) in a histogram. It’s a bit like zooming out on a picture; you can’t see as many of the details, but the overall shape of what you are looking at may become clearer.

    Make Histograms Using a Graphing Calculator

    Look again at the data from the first example. We’ve seen how to manipulate raw data to give a stem-and-leaf plot and a histogram. Now let’s take some of the tedious sorting work out of the process by using a graphing calculator to automatically sort our data into bins.

    The following unordered data represents the ages of passengers on a train carriage.

    35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2, 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21.

    Use a graphing calculator to display the data as a histogram with bin-widths of 10, 5 and 20.

    Input the data in your calculator:

    Press [START] and choose the [EDIT] option.

    Input all 43 data points into the table in column L1.

    Screen Shot 2020-05-13 at 10.22.47 PM.png

    CC BY-NC-SA

    Select plot type:

    Bring up the [STATPLOT] option by pressing [2nd][Y=].

    Highlight 1:Plot1 and press [ENTER]. This will bring up the plot options screen. Highlight the histogram and press [ENTER]. Make sure the Xlist is the list that contains your data.

    Screen Shot 2020-05-13 at 10.23.24 PM.png

    CC BY-NC-SA

    Select bin widths and plot:

    Press [WINDOW] and ensure that Xmin and Xmax allow for all data points to be shown. The Xscl value determines the bin width.

    Press [GRAPH] to display the histogram.

    You can change bin widths and see how the histogram changes, by varying Xscl. Below are histograms with bin widths of 10, 5 and 20. (In this example Xmin=0 and Xmax=100 will work whatever bin width we choose, but notice that to display the histogram correctly we need to use a different Ymax value for each.)

    Screen Shot 2020-05-13 at 10.24.05 PM.png

    CC BY-NC-SA


    Example

    Example 1

    Rowena made a survey of the ages of passengers in a train carriage, and collected the results in a frequency table. Display the results as a histogram.

    Age range Frequency
    0 – 9 6
    10 – 19 2
    20 – 29 9
    30 – 39 8
    40 – 49 11
    50 – 59 4
    60 – 69 2
    70 – 79 0
    80 – 89 1

    Solution

    Since the data is already collected into intervals we will use these as our bins for the histogram. Even though the top end of the first interval is 9, the bin on our histogram will extend to 10. This is because, as we move to continuous data, we have a range of numbers that goes right up to the lower end of the following bin, even if it doesn’t include that number. The range of values for the first bin would therefore be 0≤x<10, and all the other bins would have similarly described ranges.

    Screen Shot 2020-05-13 at 10.24.58 PM.png


    Review 

    1. Create a stem-and-leaf plot for the following data. Use the first digit (hundreds) as the stem, and the second (tens) as the leaf. Truncate any units and decimals. Order the plot to find the median and the mode. data: 607.4, 886.0, 822.2, 755.7, 900.6, 770.9, 780.8, 760.1, 936.9, 962.9, 859.9, 848.3, 898.7, 670.9, 946.7, 817.8, 868.1, 887.1, 881.3, 744.6, 984.9, 941.5, 851.8, 905.4, 810.6, 765.3, 881.9, 851.6, 815.7, 989.7, 723.4, 869.3, 951.0, 794.7, 807.6, 841.3, 741.5, 822.2, 966.2, 950.1.
    2. Make a frequency table for the data in Question 1. Use a bin width of 50.
    3. Plot the data from Question 1 as a histogram with a bin width of
      1. 50
      2. 100

    For 4-6, use the following stem-and-leaf plot which shows data collected for the speed of 40 cars in a 35 mph limit zone in Culver City, California.

    Screen Shot 2020-05-13 at 10.25.36 PM.png

    1. Find the mean, median and mode speed.
    2. Create a frequency table, starting at 25 mph with a bin width of 5 mph.
    3. Use the table to construct a histogram with the intervals from your frequency table.

    For 7-11 use the histogram shown below. The data is the result of a survey of each subject's number of siblings.

    Screen Shot 2020-05-13 at 10.26.08 PM.png

    1. The median of the data.
    2. The mean of the data.
    3. The mode of the data.
    4. The number of people who have an odd number of siblings.
    5. The percentage of the people surveyed who have 4 or more siblings.

    Review (Answers)

    To view the Review answers, open this PDF file and look for section 13.11. 


    Vocabulary

    Term Definition
    bins Bins are groups of data plotted on the x-axis.
    Continuous Continuity for a point exists when the left and right sided limits match the function evaluated at that point. For a function to be continuous, the function must be continuous at every single point in an unbroken domain.
    outliers An outlier is an observation that lies an abnormal distance from other values in a random sample from a population.
    Range The range of a data set is the difference between the smallest value and the greatest value in the data set.
    rank The rank of an observation is the number of observations that are less than or equal to the value of that observation.
    standard deviation The square root of the variance is the standard deviation. Standard deviation is one way to measure the spread of a set of data.