Your Dashboard username@email.com

Summarizing Data

Objective

In this lesson you will learn to represent data in multiple ways and compute ways to summarize that data. You will also explore potential sources of bias.

Previously Covered

  • We spent quite a bit of time reviewing volume, surface area, and areas of different three-dimensional figures. We’ve also refreshed your memory about some significant formulas and provided some extra practice using them.

All About Data

In this section, we’ll explore various aspects of data, from collection to representation and analysis. We won’t do it in that order, though, because it’s just more meaningful to first look at data and how it’s represented. Then you will be able to look at data more critically. That is, you can ask yourself questions about how the data was collected and whether it truly represents what its author says it does.

A distribution is an arrangement of values that gives information about the frequency with which they occur. A histogram is one representation of a distribution. A frequency table is another.

The center of a distribution of values is simply the middle of that distribution. A measure of central tendency is a way of defining the center of a distribution. Perhaps this sounds more complicated than it really is. Just think of a measure of central tendency as one way to summarize a set of (data) values.

In this section we’ll define and compute the three most common measures of central tendency: the mean, the median, and the mode.

The range is another measure that gives information about a set of data, but it is not a measure of central tendency. That’s because it doesn’t define the center of the distribution of the data. It’s actually called a measure of dispersion, because it describes the way that the data are dispersed.

Measures of Central Tendency

The Mean

The mean of a set of data is the arithmetic average. That’s the one you’re used to computing. It’s not called the average in statistics, because there are different kinds of averages.

To find the mean, simply add up the values in a set of data and divide by the number of values in the set.

The Mean

For example, the mean of the set {2, 4, 6, 8, 15} is equal to the sum of these numbers divided by the number of members of the set, 35 over 7.. Interestingly, The sets {7, 7, 7, 7, 7} and {0, 0, 0, 0, 35} also have a mean of seven.

You can see that the mean gives some information about a data set, but not all of the information you might want.

The Median

The median of a set of data is the value in the middle of the set. To find the median, order the values in the set from least to greatest (or from greatest to least).

The Median

For example, the median of the set {2, 4, 6, 8, 15} is six. And the median of the set {1, 6, 4, 10, 7} is also six.

What if there is an even number of values in the set? What strategy for finding the median of a set with an even number of values would you suggest? Perhaps it’s intuitive.

The people who agree on these matters have decided that the median of a set with an even number of values is the mean, or average, of the two middle values.

For example, the median of the set {0, 2, 4, 6, 8, 10} is the mean of four and six, or five. The median of the set {1, 1, 1, 9, 9, 9} is the mean of one and nine, which is also five.

The Mode

The mode of a set of data is the value that occurs most often. The mode of the set {4, 5, 3, 5, 4, 2, 1, 2, 4} is four, because four occurs more often than any other number in the set.

Unfortunately, it’s not as simple as it sounds. Sometimes a set may have more than one mode or no mode at all.

So how many modes are in the set {5, 4, 2, 4, 5, 2}? The answer is three. Every value in the set is also a mode of the set. Take some time to convince yourself that this is true.

A Measure of Dispersion: The Range

The range of a set of data is the difference between the greatest and least values in the set. That makes the range positive. To find the range of a set of data, subtract the least value in the set from the greatest value.

The Range

For example, the range of the set {1, 4, 6, 4, 2, 6, 3, 5, 9} is nine minus one, or eight. The range of the set {0, 0, 0, 0, 0, 0, 0, 0, 0, 8} is eight minus zero, which is also eight.

You can see that the range gives a pretty poor summary of a set of data, because it relies on only two values in the set. What measure would be more meaningful for a set of salaries that included yours and Bill Gates’s?

Question

For the set below, which measure is least?

{1, 8, 7, 3, 7, 6, 5, 6, 2}

  1. Mean
  2. Median
  3. Mode
  4. Range

Reveal Answer

Choice A is the correct answer. The mean of this set is equal to the sum of the values in the set, forty-five, divided by the number of values in the set, nine. So the mean is five. The median of the set is the middle number when the set is ordered. The ordered set is {1, 2, 3, 5, 6, 6, 7, 7, 8}, which has a median of six. The mode of the set is the value that occurs most often. The set has two modes: six and seven. The range of the set is the difference between the greatest and least values in the set. The range is eight minus one, or seven. The least of these measures is the mean, which is five.

Review

  • The mean of a set of data is the arithmetic average.
  • The median of a set of data is the value in the middle of the set.
  • The mode of a set of data is the value that occurs most often.
  • The range of a set of data is the difference between the greatest and least value in the set.

Back to Top