| Range is the difference between the highest and lowest data element. |
Symbolically, range is computed as xmax-xmin. Although this is very similar to the formula for midrange, please do not make the common mistake of reversing the two. This is not a reliable measure of dispersion, since it only uses two values from the data set. Thus, extreme values can distort the range to be very large while most of the elements may actually be very close together. For example, the range for the data set 1, 1, 2, 4, 7 introduced earlier would be 7-1=6.
Recently it has come to my attention that a few books define statistical range the same as its more mathematical usage. I've seen this both in grade school and college textbooks. Thus instead of being a single number it is the interval over which the data occurs. Such books would state the range as [xmin,xmax] or xmin to xmax. Thus for the example above, the range would be from 1 to 7 or [1,7]. Be sure you do not say 1-7 since this could be interpretted as -6.
| sample standard deviation |
Notice the difference between the sample and population standard deviations. The sample standard deviation uses n-1 in the denominator, hence is slightly larger than the population standard deviation which use N (which is often written as n).
| population standard deviation |
It is much easier to remember and apply these formulae,
if you understand what all the parts are for.
We have already discussed the use of Roman vs. Greek letters
for sample statistics vs. population parameters.
This is why s is used for the sample standard deviation
and
(sigma) is used for the population standard deviation.
However, another sigma, the capital one
(
),
appears inside the formula. It serves to indicate that we are
adding things up.
What is added up are the deviations from the mean:
- xi.
But the average deviation from the mean is actually
zeroby definition of the mean!
Occasionally the mean deviation, using average distance
or using the symbols for absolute value:
|
- xi|
is used.
However, a better measure of variation comes from squaring each deviation,
summing those squares, then taking the square root after dividing by
one less than the number of data elements.
If you compare this with the formula for
quadratic mean
you will realize we are doing the same thing, except for what we are
dividing by. That n-1 can be understood in terms of
degrees of freedoma topic which goes beyond this introduction.
Another formula for standard deviation is also commonly encountered. It is as follows.
![]()
| Shortcut formula for standard deviation |
This formula can be algebraically derived from the former and has two primary applications. First, calculators and computer programs often employ it because less intermediate results are necessary and it can be calculated in one pass through the data set. That is, you don't have to calculate the mean first and then find the deviations. Second, it is closely related to a formula which may be used to calculate the standard deviation for a frequency table. For this course, we will rely on our graphing calculators and the appropriate activity is discussed in today's activity.
|
|
Occasionally, the abbreviations SD for standard deviation and Var for variance will be seen.
A commonly given rule of thumb is that the range of a data set is approximately 4 standard deviations (4s). Thus the maximum data element will be about 2 standard deviations above the mean and the minimum data element about 2 standard deviations below the mean. We will explore this further in tomorrow's lesson.
It is rather meaningless to calculate the standard deviation for a data set of two elements.
| Three is considered the smallest sample size where standard deviation is meaningful. |
It is not uncommon for an experiment to involve millions of events and associated data. If you examine the standard deviation formula above, you will note that it depends inversely on the square root of n. We could thus expect to reduce the standard deviation of our answer by perhaps a thousand fold. It is the goal of many experiments to obtain very precise values, so great care is exercised to reduce systematic errors and also reduce the affect of random errors by increasing the repetitions.
Example:
Consider a simple example of counting pennies where the outcomes
99, 100 and 101 are obtained. Find the mean and standard deviation.
Solution: We can easily calculate the mean as 100
and the standard deviation as 1.0.
Example:
Consider further if this exercise were repeated 1000 times and 100
was obtained 991 times, 99 5 times and 101 4 times.
Again, calculate the mean and standard deviation.
Solution:
The mean is now 99.999 and the standard deviation is now 0.095.
Here the additional precision is justified and the mean and
standard deviation are given to the same 3 decimal place precision.
It would be a mistake to report these results to only one more digit
than the original data set, as in 100.0 and 0.1.
| DO NOT USE a rounded s to obtain s2. Variance is the primary statistic, s is a derived quantity. |
Standard deviation should be reported to at least one more decimal place than the data, or three significant digits.
Please use your TI-84+ type calculator for the following activities.
Press the 2nd MODE (QUIT) or just go directly to
STAT arrow over to CALC and ENTER to select
1-Var Stats. Now enter 2nd 1 (L1), a comma (,),
and 2nd 2 (L2)
followed by ENTER.
Your screen should now appear as at right.
When doing a frequency mean, the order of the lists is important. You need to place the score list first and then the frequency list. Thus you had 1-Var Stats L1,L2 on your screen and not 1-Var Stats L2,L1. If you did it the wrong way, you can easily tell if there is an error by looking at the n value. The wrong way gave n=525 instead of the correct value of n=100.
Under the 1-Var Stats, the arithmetic mean,
, is listed.
Be sure to always round this to the proper significance.
Below that is also included the sample standard deviation,
denoted by a Sx.
Notice that both the sample and population standard deviation,
are given.
In lesson five,
the differences between the two will be discussed.
Watch out carefully for which one applies to a given data set.
(Remember, standard deviation is a measure of the
"average" distance each score is away from the mean.)
One last note is use of the VARS key followed with 5 (Statisitics), to get Sx to more easily square the standard deviation to obtain the variance. This will facilitate the avoidance of rounding and increase the quality of the variance number obtained.
| BACK | HOMEWORK | ACTIVITY | CONTINUE |
|---|