Back to the Table of Contents

An Introduction to Statistics - Lesson 5

Measures of Dispersion

Lesson Overview

Measures of Dispersion

Another important characteristic of a data set is how it is distributed, or how far each element is from some measure of central tendancy (average). There are several ways to measure the variability of the data. Although the most common and most important is the standard deviation, which provides an average distance for each element from the mean, several others are also important, and are hence discussed here.

Range

Range is the difference between the highest and lowest data element.

Symbolically, range is computed as xmax-xmin. Although this is very similar to the formula for midrange, please do not make the common mistake of reversing the two. This is not a reliable measure of dispersion, since it only uses two values from the data set. Thus, extreme values can distort the range to be very large while most of the elements may actually be very close together. For example, the range for the data set 1, 1, 2, 4, 7 introduced earlier would be 7-1=6.

Recently it has come to my attention that a few books define statistical range the same as its more mathematical usage. I've seen this both in grade school and college textbooks. Thus instead of being a single number it is the interval over which the data occurs. Such books would state the range as [xmin,xmax] or xmin to xmax. Thus for the example above, the range would be from 1 to 7 or [1,7]. Be sure you do not say 1-7 since this could be interpretted as -6.

Standard Deviation

The Standard deviation is another way to calculate dispersion. This is the most common and useful measure because it is the average distance of each score from the mean. The formula for sample standard deviation is as follows.

[s= the square root of (the sum of the
squares of the deviations from the mean divided by n-1)]    sample standard deviation

Notice the difference between the sample and population standard deviations. The sample standard deviation uses n-1 in the denominator, hence is slightly larger than the population standard deviation which use N (which is often written as n).

[sigma= the square root of (the sum of the
squares of the deviations from the mean divided by N)]    population standard deviation

It is much easier to remember and apply these formulae, if you understand what all the parts are for. We have already discussed the use of Roman vs. Greek letters for sample statistics vs. population parameters. This is why s is used for the sample standard deviation and [sigma] (sigma) is used for the population standard deviation. However, another sigma, the capital one ([Sigma]), appears inside the formula. It serves to indicate that we are adding things up. What is added up are the deviations from the mean: [x bar] - xi. But the average deviation from the mean is actually zero—by definition of the mean! Occasionally the mean deviation, using average distance or using the symbols for absolute value: |[x bar] - xi| is used. However, a better measure of variation comes from squaring each deviation, summing those squares, then taking the square root after dividing by one less than the number of data elements. If you compare this with the formula for quadratic mean you will realize we are doing the same thing, except for what we are dividing by. That n-1 can be understood in terms of degrees of freedom—a topic which goes beyond this introduction.

Another formula for standard deviation is also commonly encountered. It is as follows.

[s squared = the square root of
((n times the sum of the data elements square less the square
of the sum of the data elements) divided by n(n-1))]

   Shortcut formula for standard deviation

This formula can be algebraically derived from the former and has two primary applications. First, calculators and computer programs often employ it because less intermediate results are necessary and it can be calculated in one pass through the data set. That is, you don't have to calculate the mean first and then find the deviations. Second, it is closely related to a formula which may be used to calculate the standard deviation for a frequency table. For this course, we will rely on our graphing calculators and the appropriate activity is discussed in today's activity.

Variance

Variance is the third method of measuring dispersion. Compare the two variance formulae with their corresponding standard deviation formulae, and we see that variance is just the square of the standard deviation. Statisticians tend to consider variance a primary measure and use it extensively (ANOVA, etc.), where as scientists are very happy to use standard deviation exclusively. For official information on uncertainty, please refer to the following National Institute of Standards and Technology web page. Uncertainty is another way to discuss variance and Heisenberg's Uncertainty Principle is at the very root of quantum mechanics.

[s squared = the sum of the
squares of the deviations from the mean divided by n-1]     [sigma square = the sum of the
squares of the deviations from the mean divided by N]

Occasionally, the abbreviations SD for standard deviation and Var for variance will be seen.

Range Rule of Thumb

It can take some time to start to understand how these measures of variation may be useful. One of the reasons we provide mean and standard deviation information regarding tests in this course is to help develop this understanding. Often your test scores will be adjusted via two different methods. Consider the following scenerios. First, if a straight five points are added to everyone's score, the mean would increase five points, say from 70.8 to 75.8 but have no affect on the standard deviation. It remains, say, at 10.9. Second, if each test score was multiplied by .89 and then 21 points were added, not only does this move the mean from, say, 55.4 to 70.3, but it also reduced the standard deviation from, say, 15.0 to 13.5. This can be useful if the original test scores were very variable, and could easily have resulted in more D's and F's than your efforts justified. You might consider a third common way to adjust test scores, that of dropping the possible. Technically this doesn't change either the mean or the standard deviation, but it does effectively raise everyone's percentage. This doesn't help the lower scoring students nearly as much as it helps the top students.

A commonly given rule of thumb is that the range of a data set is approximately 4 standard deviations (4s). Thus the maximum data element will be about 2 standard deviations above the mean and the minimum data element about 2 standard deviations below the mean. We will explore this further in tomorrow's lesson.

More Round-off Information

The standard deviation of a data set is often used in science as a measure of the precision to which a experiment has been done. It can also indicate the reproducibility of the result. Propagation of error will not be fully discussed here, except to note that intermediate values in your calculations should not be rounded. At least twice as many digits as will be used in the final answer should be retained.

It is rather meaningless to calculate the standard deviation for a data set of two elements.

Three is considered the smallest sample size where standard deviation is meaningful.

It is not uncommon for an experiment to involve millions of events and associated data. If you examine the standard deviation formula above, you will note that it depends inversely on the square root of n. We could thus expect to reduce the standard deviation of our answer by perhaps a thousand fold. It is the goal of many experiments to obtain very precise values, so great care is exercised to reduce systematic errors and also reduce the affect of random errors by increasing the repetitions.

Example: Consider a simple example of counting pennies where the outcomes 99, 100 and 101 are obtained. Find the mean and standard deviation.
Solution: We can easily calculate the mean as 100 and the standard deviation as 1.0.

Example: Consider further if this exercise were repeated 1000 times and 100 was obtained 991 times, 99 5 times and 101 4 times. Again, calculate the mean and standard deviation.
Solution: The mean is now 99.999 and the standard deviation is now 0.095. Here the additional precision is justified and the mean and standard deviation are given to the same 3 decimal place precision. It would be a mistake to report these results to only one more digit than the original data set, as in 100.0 and 0.1.

DO NOT USE a rounded s to obtain s2. Variance is the primary statistic, s is a derived quantity.

Standard deviation should be reported to at least one more decimal place than the data, or three significant digits.

Frequency Means/Standard Deviation

Please use your TI-84+ type calculator for the following activities.
Press the STAT key and ENTER to select EDIT.
In L1, enter in these test scores: 55, 60, 70, 75, 80, 90, and 95.
In L2, enter in these test frequencies: 5, 15, 20, 25, 20, 12, and 3.
These last values are how many tests there were for each of these scores.

Press the 2nd MODE (QUIT) or just go directly to STAT arrow over to CALC and ENTER to select 1-Var Stats. Now enter 2nd 1 (L1), a comma (,), and 2nd 2 (L2) followed by ENTER. Your screen should now appear as at right.

When doing a frequency mean, the order of the lists is important. You need to place the score list first and then the frequency list. Thus you had 1-Var Stats L1,L2 on your screen and not 1-Var Stats L2,L1. If you did it the wrong way, you can easily tell if there is an error by looking at the n value. The wrong way gave n=525 instead of the correct value of n=100.

Under the 1-Var Stats, the arithmetic mean, [x bar], is listed. Be sure to always round this to the proper significance. Below that is also included the sample standard deviation, denoted by a Sx. Notice that both the sample and population standard deviation, [sigma sub x] are given. In lesson five, the differences between the two will be discussed. Watch out carefully for which one applies to a given data set. (Remember, standard deviation is a measure of the "average" distance each score is away from the mean.)

One last note is use of the VARS key followed with 5 (Statisitics), to get Sx to more easily square the standard deviation to obtain the variance. This will facilitate the avoidance of rounding and increase the quality of the variance number obtained.

BACK HOMEWORK ACTIVITY CONTINUE