Often the mean, or average, is used to summarize a numerical set of data with a single value. But the mean by itself doesn’t always mean anything without knowing what the spread of the data is. There are three main ways that spread of data is measured. These are called “measures of variation”. The measures of variation explained in this article are range, standard deviation and variance.
The range of a data set is the difference between the largest and smallest values in the data set.
Set A: 10, 13, 18, 23, 23, 29
Set B: 4, 10, 17, 18, 32, 36
Notice the mean value for Set A is (10 + 13 + 18 + 23 + 23 + 29)/6 = 19.33. The mean for Set B is (4 + 10 + 17 + 18 + 32 + 36) is also 19.33. But the range for Set A is (29 – 10) = 19, and the range for Set B is (36 – 4) = 32. In some cases, the larger range for Set B or the smaller range for Set A may have a biggest significance than the fact that each set has the same mean value.
Variance and Standard Deviation
While the range is effective for showing the amount of spread in the data, the variance and standard deviation as important measures for spread of the data around the mean. How is the variance and standard deviation calculated?
If x is a variable representing a data value, x-bar represents the sample mean, and n represents the sample size, then the variance is the sum of (x – x-bar)2 /(n – 1). The sample variance s2 can be thought of as the average of the (x – x-bar)2 values. While we compute the average of a sample by dividing by the sample size, to ensure a better estimate for the sample variance, we divide by (n -1).
The sample standard deviation is the square root of the variance. Why do we ever take the square root of the variance? Suppose the original units were a monetary unit, then s2 would be a monetary unit squared, for instance “cents squared”. What does that mean? It’s confusing. The square root enables us to return the units back to its original form. Basically, the larger value for the standard deviation means larger variability.
Suppose you want to have a flower bed with all one type of flower, but you want each plant to be relatively consistent in size. You plant 6 flowers of species A and 6 flowers of species B and record their heights in inches. The data is as follows:
species A: 12, 13, 13, 16, 18, 18
species B: 8, 13, 13, 17, 17, 22
Both species have a mean height of 15 inches. But species A has a standard deviation of 2.68 inches. Species B has a standard deviation of 3.69 inches. From this information you’d be better off planting species A to have a more consistent height for each plant. Species B would be good to choose if you want to have some smaller and some larger plants.
This guide should help assist students having difficulty understanding the basic concepts involving measures of variation.