Fractions

Objective

In this section, we’ll look at a variety of ways to represent data. You’ll see that some data can be represented in more than one way and will study how these different methods are put to use.

In statistics, there are many ways to analyze and display data. One of the most important features of data display is the assurance that the audience will understand the data and their implications. To review some of the methods we use to display data in figures, we will start with some of the basics.

Line Graphs

Line graphs are used to display two-variable data that change continuously over time. Line graphs are good for showing trends in data. That is, they clearly show how one variable is affected as the other increases or decreases. They’re also good for making predictions of values outside the data set.

Year	Population
2000	455,000
2001	470,000
2002	500,000
2003	570,000
2004	655,000

Bar Graphs

Bar graphs are used to compare discrete (not continuous) quantities in different categories. The height of bars in a vertical bar graph and the length of bars in a horizontal bar graph are proportional to the numbers they represent. Bar graphs are quite useful for displaying results to a survey, in particular.

Number of Students	Way to School
150	walk
50	bicycle
200	car/carpool
650	bus
25	other

Stem-and-Leaf Plots

A stem-and-leaf plot organizes data to show its shape and distribution.

Frequency Tables and Histograms

Frequencies tables and histograms are given together because, like line and bar graphs, a histogram is easier to create if a table of the data is constructed first. In addition, a stem-and-leaf plot looks a bit like a histogram turned on its side. You’ll see.

A frequency table is a chart that shows the number of times that values within each interval of a data set occur. A histogram is a bar-graph representation of the data in a frequency table that shows the proportion of data that fall into categories. The categories are nonoverlapping intervals of the data distribution.

These representations are best demonstrated with an example. Let’s look at the ages of the members of a small golf club.

Here’s the ordered set:

{42, 43, 43, 44, 46, 47, 47, 47, 48, 48, 48, 52, 52, 53, 54, 54, 54, 55, 56, 57, 57, 57, 57, 59, 60, 61, 62, 63, 64, 66, 66, 68, 69, 70, 71}

Step by Step:

Decide on the number of classes, or bins. The width of the intervals depends on these. The table below shows 6 classes.
Now find the widths of the intervals: Widths of five years are shown on the table below.

Class	Frequency
42 – 46	5
47 – 51	6
52 – 56	8
57 – 61	7
62 – 66	5
67 – 71	4

This table shows that most people in this club are between 52 and 56 years old.

Here’s a histogram of the data in the frequency table:

Question

In a stem-and-leaf plot of the data that is displayed in the histogram below, how many leaves would a stem of 5 have?

8
11
21
Cannot be determined

Reveal Answer

Choice D is the correct answer. This one is a bit tricky. There are three intervals in the histogram that contain the stem of 5 (those between the ages of 50 and 59). One contains the number of members between 47 and 51 years old, another contains those between 52 and 56 years old, and finally one showing those between 57 and 61 years old. One of these drifts below 50 and one stretches above 60 meaning that the number of those between 50 and 59 years old cannot be determined.

Normal Distribution

Normal distributions are the ideal data distributions and are bell-shaped curves (histograms) like this one:

The peak of the curve is where the mean, median, and mode all lie. Remember, this is a histogram, so it shows that more values in the set are concentrated in the middle than in the tails.

Pick any point on the curve. The farther it is from the mean (and the median and mode), the less likely it is to appear in the set. The closer it is to the mean, the more likely it is to appear in the set.

For example, suppose this curve is actually a histogram of exam scores in your class. Then most of the scores are close to the mean, and the fewest are farthest from the mean. In other words, suppose the mean of the test is 86. Then it is more likely that a student received a grade of 80 than a grade of 50.

We’ll look back at the normal distribution when we explore probability.

Scatterplots

A scatterplot is a graph, or collection of points, of two-variable numerical data. Like line graphs, scatterplots are quite useful for showing trends in data.

Question

For which of the following data sets would a line graph be more appropriate than a scatterplot?

A set of data that gives the altitude of a plane as it descends to a runway
A set of data that gives the ages of all of the principals in a school district
A set of data that gives the height and arm span of a classroom of students
A set of data that gives the annual sales for ten different restaurants for one year

Reveal Answer

Choice A is the correct answer. A line graph is used to display values that change over time. A line graph could show the altitude of the airplane at several points throughout its descent. Choice B is incorrect, because data of this kind is best represented in a histogram. Choice C is incorrect, because data of this kind is best represented in a scatterplot.

Box-and-Whisker Plots

The last type of representation of data that we’ll explore in this section is a box-and-whisker plot. A box-and-whisker plot reveals a five-number summary of a data set. These five numbers are the mean, the upper and lower quartiles, and the maximum and minimum values in the set.

That said, we need some more definitions.

Recall that the median of a data set is the number in the middle. Half of the values lie above it and half of the values below it.

A quartile is one fourth of the data values in a set.

The upper quartile is the median of the upper half of the set. One fourth of the data values lie above the upper quartile.

The lower quartile is the median of the lower half of the set. One fourth of the data values lie below the lower quartile.

Review

Line graphs are used to display two-variable data that changes continuously over time.
Bar graphs are used to compare discrete (not continuous) quantities in different categories.
A stem-and-leaf plot is a display that organizes data to show its shape and distribution.
A frequency table is a chart that shows the number of times that values within each interval of a data set occur. A histogram is a bar-graph-like representation of the data in a frequency table.
A scatterplot is a graph, or collection of points, of two-variable numerical data.
A box-and-whisker plot reveals the mean, the upper and lower quartiles, and the maximum and minimum values in the set.

Workshop Index Next Lesson ➡