Skip to main content
K12 LibreTexts

1.5.4: Displaying Bivariate Data

  • Page ID
    5749
  • Using Scatterplots and Line Plots 

    Scatterplots and Line Plots

    Bivariate simply means two variables. All our previous work was with univariate, or single-variable data. The goal of examining bivariate data is usually to show some sort of relationship or association between the two variables.

    We have looked at recycling rates for paper packaging and glass. It would be interesting to see if there is a predictable relationship between the percentages of each material that a country recycles. Following is a data table that includes both percentages.

    Country % of Paper Packaging Recycled % of Glass Packaging Recycled
    Estonia 34 64
    New Zealand 40 72
    Poland 40 27
    Cyprus 42 4
    Portugal 56 39
    United States 59 21
    Italy 62 56
    Spain 63 41
    Australia 66 44
    Greece 70 34
    Finland 70 56
    Ireland 70 55
    Netherlands 70 76
    Sweden 70 100
    France 76 59
    Germany 83 81
    Austria 83 44
    Belgium 83 98
    Japan 98 96

    Figure: Paper and Glass Packaging Recycling Rates for 19 countries

    Scatterplots

    We will place the paper recycling rates on the horizontal axis and those for glass on the vertical axis. Next, we will plot a point that shows each country's rate of recycling for the two materials. This series of disconnected points is referred to as a scatterplot.

    Screen Shot 2020-04-26 at 6.47.59 PM.png

    Recall that one of the things you saw from the stem-and-leaf plot is that, in general, a country's recycling rate for glass is lower than its paper recycling rate. On the next graph, we have plotted a line that represents the paper and glass recycling rates being equal. If all the countries had the same paper and glass recycling rates, each point in the scatterplot would be on the line. Because most of the points are actually below this line, you can see that the glass rate is lower than would be expected if they were similar.

    Screen Shot 2020-04-26 at 6.48.24 PM.png

    With univariate data, we initially characterize a data set by describing its shape, center, and spread. For bivariate data, we will also discuss three important characteristics: shape, direction, and strength. These characteristics will inform us about the association between the two variables. The easiest way to describe these traits for this scatterplot is to think of the data as a cloud. If you draw an ellipse around the data, the general trend is that the ellipse is rising from left to right.

    Screen Shot 2020-04-26 at 6.49.00 PM.png

    Data that are oriented in this manner are said to have a positive linear association. That is, as one variable increases, the other variable also increases. In this example, it is mostly true that countries with higher paper recycling rates have higher glass recycling rates. Lines that rise in this direction have a positive slope, and lines that trend downward from left to right have a negative slope. If the ellipse cloud were trending down in this manner, we would say the data had a negative linear association. For example, we might expect this type of relationship if we graphed a country's glass recycling rate with the percentage of glass that ends up in a landfill. As the recycling rate increases, the landfill percentage would have to decrease.

    The ellipse cloud also gives us some information about the strength of the linear association. If there were a strong linear relationship between the glass and paper recycling rates, the cloud of data would be much longer than it is wide. Long and narrow ellipses mean a strong linear association, while shorter and wider ones show a weaker linear relationship. In this example, there are some countries for which the glass and paper recycling rates do not seem to be related.

    Screen Shot 2020-04-26 at 6.49.29 PM.png

    New Zealand, Estonia, and Sweden (circled in yellow) have much lower paper recycling rates than their glass recycling rates, and Austria (circled in green) is an example of a country with a much lower glass recycling rate than its paper recycling rate. These data points are spread away from the rest of the data enough to make the ellipse much wider, weakening the association between the variables.

    Line Plots

    The following data set shows the change in the total amount of municipal waste generated in the United States during the 1990's:

    Year Municipal Waste Generated (Millions of Tons)
    1990 269
    1991 294
    1992 281
    1993 292
    1994 307
    1995 323
    1996 327
    1997 327
    1998 340

    Figure: Total Municipal Waste Generated in the US by Year in Millions of Tons. 

    In this example, the time in years is considered the explanatory variable, or independent variable, and the amount of municipal waste is the response variable, or dependent variable. It is not only the passage of time that causes our waste to increase. Other factors, such as population growth, economic conditions, and societal habits and attitudes also contribute as causes. However, it would not make sense to view the relationship between time and municipal waste in the opposite direction.

    When one of the variables is time, it will almost always be the explanatory variable. Because time is a continuous variable, and we are very often interested in the change a variable exhibits over a period of time, there is some meaning to the connection between the points in a plot involving time as an explanatory variable. In this case, we use a line plot. A line plot is simply a scatterplot in which we connect successive chronological observations with a line segment to give more information about how the data values are changing over a period of time. Here is the line plot for the US Municipal Waste data:

    Screen Shot 2020-04-26 at 6.50.53 PM.png

    Interpreting Graphs for Bivariate Data

    It is easy to see general trends from scatter plots or line plots. For Example B, we can spot the year in which the most dramatic increase occurred (1990) by looking at the steepest line. We can also spot the years in which the waste output decreased and/or remained about the same (1991 and 1996). It would be interesting to investigate some possible reasons for the behaviors of these individual years.

    Interpreting the Trend

    Following is a scatterplot of the number of lives births per 10,000 23-year-old women in the United States between 1917 and 1975. Comment on the pattern this shows of birthrate over time.

    Screen Shot 2020-04-26 at 6.55.23 PM.png

    Birthrate, over time, appears to be cyclic. There was a dip in birthrate in 1932, then a gradual increase to a high in 1956. After that there was a drop in the birthrate.


    Technology Notes

    Scatterplots on the TI-83/84 Graphing Calculator

    Press [STAT][ENTER], and enter the following data, with the explanatory variable in L1 and the response variable in L2. (Note that this data set contains 18 points- not all are visible on the screen at once). Next, press [2ND][STAT-PLOT] to enter the STAT-PLOTS menu, and choose the first plot.

    Screen Shot 2020-04-26 at 7.03.02 PM.png

    Change the settings to match the following screenshot:

    Screen Shot 2020-04-26 at 7.06.01 PM.png

    This selects a scatterplot with the explanatory variable in L1 and the response variable in L2. In order to see the points better, you should choose either the square or the plus sign for the mark. The square has been chosen in the screenshot. Finally, set the window as shown below to match the data. In this case, we looked at our lowest and highest data values in each variable and added a bit of room to create a pleasant window. Press [GRAPH] to see the result, shown below.

    Screen Shot 2020-04-26 at 7.09.29 PM.png

    Line Plots on the TI-83/84 Graphing Calculator

    Your graphing calculator will also draw a line plot, and the process is almost identical to that for creating a scatterplot. Enter the data into your lists, and choose a line plot in the Plot1 menu, as in the following screenshot.

    Screen Shot 2020-04-26 at 7.11.57 PM.png

    Next, set an appropriate window (not necessarily the one shown below), and graph the resulting plot.

    Screen Shot 2020-04-26 at 7.12.49 PM.png


    Example

    The following example using the information below: 

    Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcoholic beverages. The data gathered is included in the following table.

    Region Alcohol Tobacco
    North 6.47 4.03
    Yorkshire 6.13 3.76
    Northeast 6.19 3.77
    East Midlands 4.89 3.34
    West Midlands 5.63 3.47
    East Anglia 4.52 2.92
    Southeast 5.89 3.20
    Southwest 4.79 2.71
    Wales 5.27 3.53
    Scotland 6.08 4.51
    No. Ireland 4.02 4.56

    Source: Carnegie Mellon University 

    Example 1

    Use the Technology Notes at the end of the section to make a scatter plot of this data. Comment of the relationship between household spending on alcohol and tobacco products

    Here is what the image on your graphing calculator should look like for your scatter plot:

    Screen Shot 2020-04-26 at 7.21.27 PM.png

    It appears that household spending on alcohol productions and household spending on tobacco products are directly related. That is, as one goes up, the other goes up.


    Review 

    For 1-4, remember a previous practice problem where you looked at the percentage of waste recycled in each state. Do you think there is a relationship between the percentage recycled and the total amount of waste that a state generates? Here are the data, including both variables.

    State Percentage Total Amount of Municipal Waste in Thousands of Tons
    Alabama 23 5549
    Alaska 7 560
    Arizona 18 5700
    Arkansas 36 4287
    California 30 45000
    Colorado 18 3084
    Connecticut 23 2950
    Delaware 31 1189
    District of Columbia 8 246
    Florida 40 23617
    Georgia 33 14645
    Hawaii 25 2125
    Illinois 28 13386
    Indiana 23 7171
    Iowa 32 3462
    Kansas 11 4250
    Kentucky 28 4418
    Louisiana 14 3894
    Maine 41 1339
    Maryland 29 5329
    Massachusetts 33 7160
    Michigan 25 13500
    Minnesota 42 4780
    Mississippi 13 2360
    Missouri 33 7896
    Montana 5 1039
    Nebraska 27 2000
    Nevada 15 3955
    New Hampshire 25 1200
    New Jersey 45 8200
    New Mexico 12 1400
    New York 39 28800
    North Carolina 26 9843
    North Dakota 21 510
    Ohio 19 12339
    Oklahoma 12 2500
    Oregon 28 3836
    Pennsylvania 26 9440
    Rhode Island 23 477
    South Carolina 34 8361
    South Dakota 42 510
    Tennessee 40 9496
    Utah 19 3760
    Vermont 30 600
    Virginia 35 9000
    Washington 48 6527
    West Virginia 20 2000
    Wisconsin 36 3622
    Wyoming 5 530
    1. Identify the variables in this example, and specify which one is the explanatory variable and which one is the response variable.
    2. How much municipal waste was created in Illinois?
    3. Draw a scatterplot for this data.
    4. Describe the direction and strength of the association between the two variables.

    For 5-8, the following line graph shows the recycling rates of two different types of plastic bottles in the US from 1995 to

    2001.

    Screen Shot 2020-04-26 at 7.26.38 PM.png

    1. Explain the general trends for both types of plastics over these years.
    2. What was the total change in PET bottle recycling from 1995 to 2001?
    3. Can you think of a reason to explain this change?
    4. During what years was this change the most rapid?

    References

    National Geographic, January 2008. Volume 213 No.1

    1. Which plots are most useful to interpret the ideas of shape, center, and spread?
    2. What effects does the shape of a data set have on the statistical measures of center and spread

    Review (Answers)

    To view the Review answers, open this PDF file and look for section 2.4.


    Additional Resources

    Practice for Displaying Bivariate Data