Back to the Table of Contents

Statistical Probabilities and Distributions: Lesson 15

Hypothesis Testing, 2, ANOVA

Lesson Overview

Hypothesis Testing

Once descriptive statistics, combinatorics, and distributions are well understood, we can move on to the vast area of inferential statistics. The basic concept is one called hypothesis testing or sometimes the test of a statistical hypothesis. Here we have two conflicting theories about the value of a population parameter. It is very important that the hypotheses be conflicting (contradictory), if one is true, the other must be false and vice versa. Another way to say this is that they are mutually exclusive and exhaustive, that is, no overlap and no other values are possible. Simple hypotheses only test against one value of the population parameter (p=½, for instance), whereas composite hypotheses test a range of values (p > ½).

Our two hypotheses have special names: the null hypothesis and the alternative hypothesis. Historically, the null (invalid, void, amounting to nothing) hypothesis was what the researcher hoped to reject. However, these days it is common practice not to associate any special meaning to which hypothesis is which. The null hypothesis is represented by H0 and the alternative hypothesis by Ha. Although simple hypotheses would be easiest to test, it is much more common to have one of each type or for both to be composite. If the values specified by Ha are all on one side of the value specified by H0, then we have a one-sided test (one-tailed), whereas if the Ha values lie on both sides of H0, then we have a two-sided test (two tailed).

The outcome of our test regarding the population parameter will be that we either reject the null hypothesis or fail to reject the null hypothesis. It is now considered poor form to "accept" the null hypothesis, although if we fail to reject it, that is in fact essentially what we are doing.

Type I and Type II Errors

Two types of errors can occur and there are three naming schemes for them. These errors cannot both occur at once. Perhaps a table will make it clearer.

Reject\TruthH0 TrueHa True
Reject Hano errorFalse positive, Type II,
beta=P(Reject Ha|Ha true)
Reject H0False negative, Type I,
alpha=P(Reject H0|H0 true)
no error

The term false positive for type II errors comes from perhaps a blood test where the test results came back positive, but it is not the case (false) that the person has whatever was being tested for. The term false negative for type I errors then would mean that the person does indeed have whatever was being tested for, but the test didn't find it. When testing for pregnancy, AIDS, or other medical conditions, both types of errors can be a very serious matter. Formally, alpha=P(Accept Ha|H0 true), meaning the probability that we "accepted" Ha when in fact H0 was true. This meaning for alpha is very similar to that encountered earlier and is often called the level of significance. Alpha and beta usually cannot both be minimized---there is a trade-off between the two. Historically, a fixed level of significance was selected (alpha=0.05 for the social sciences and alpha=0.01 or alpha=0.001 for the natural sciences, for instance). This was due to the fact that the null hypothesis was considered the "current theory" and the size of Type I errors was much more important than that of Type II errors. Now both are usually considered together when determining an adequately sized sample. Instead of testing against a fixed level of alpha, now a P-value is often reported.

The P-value of a test is the probability that the test statistic would take a value as extreme or more extreme than that actually observed, assuming H0 is true.

Obviously, the smaller the P-value, the stronger the evidence (higher significance, smaller alpha) provided by the data is against H0.

Power of a Test

The power of a test against the associated correct value is 1-beta. It is the probability that a Type II error is not committed. There is a different value of beta for each possible correct value of the population parameter. It also depends on sample size (n), thus increasing the sample size increases the power. Power is thus important in planning and interpretting tests of significance.

It is easy to misspeak power (1-beta) and P-value (alpha).

Chi Square Distributions and Tests

The chi-square distribution is a continuous distribution related to the normal distribution. Specifically it involves the sum of squares of normally distributed random variables. Chi is a greek letter (). The 2 distribution is important in several contexts, most commonly involving variance.

The 2 distribution is characterized by one parameter called the degrees of freedom which is often denoted by v (the greek letter nu) and used as a subscript: 2v.

  1. The 2 distribution is continuous.
  2. The 2 distribution is unimodal.
  3. The 2 distribution is always positive.
  4. The 2 distribution mean = v.
  5. The 2 distribution variance = 2v.
  6. For small v (v < 10), the distribution is highly skewed to the right. 2v.
  7. As v increases the 2 distribution becomes more symmetrical about v.
Gosset first described the distribution of s2. It is related to the 2 by the simple factor (n-1)/2. Although he wasn't able to prove this mathematically, he demonstrated it by dividing a prison population of 3000 into 750 random samples of size four and used their heights.

A common application of the chi-square statistics is in a test for goodness of fit as described in the homework. It is also use for tests of indepedence. Chi-square contingency tables are often formed and a contingency coefficient may also be used, especially when working with nonparametric measurements.

The F-Distribution and ANOVA

In prior sections we considered tests of inference about the means of various distributions. One can use the t procedure for inferences about the population means for normal populations and often for nonnormal populations as well. Similarly, proportions can easily be tested. One might then be tempted to consider tests of inferences about the standard deviation of a population, but the expert advice is: don't do it without expert advice! The F Statistic is not robust against nonnormality. Also, the F distribution and ANOVA are historically not tested on the AP Statistics Exam.

When comparing standard deviations the test is called analysis of variance or more commonly by its acronym ANOVA. The ANOVA F allows us to compare sevaral means, not just two as was done earlier with the t statistic.

Since we have use the term F several times it now behooves us to look at the underlying F distribution. The F distribution is named in honor of R. A. Fisher who first studied it in 1924. (As you can see by this date and Gauss's work, Statistics really only recently developed.) Specifically, the F distribution compares the variance of two normal populations. If 12= 22, then we expect s12 - s22 to be distributed about zero or equivalently the ratio s12/s22 to be close to 1.0. However, this will depend on both sample sizes, or more precisely, on the degrees of freedom.

The ratio of the variances of two independent random samples taken from normal parent populations
with equal variances has an F-distribution characterized by the degrees of freedom: v1=n1-1 and v2=n2-1

  1. The F distribution is always positive or zero and positively skewed (right).
  2. The F distribution is characterized by two parameters, the degrees of freedom of the two samples.
  3. The F distribution is the ratio of two 2 variables.
  4. The mean and variance for the F distribution depends on the two degrees of freedom.
  5. Extensive tables exist, but only for F > 1.0, so use the larger variance as numerator.
  6. The t, 2, and F are all related to the gamma distribution.
This concludes our overview of probability and distributions. Please check your booklets for completeness and prepare them for the completeness activity (quiz) and subsequent stapling.

T. OF CONTENTS HOMEWORK SOLUTIONS ACTIVITY