Back to the Table of Contents

An Introduction to Statistics - Lesson 1

Definitions, Uses, Data Types, and Levels of Measurement

Lesson Overview

The term statistics has two basic meanings. First, statistics is a subject or field of study closely related to mathematics. This two week, ten lesson unit will serve as a short introduction, briefly covers the area known as descriptive statistics, and introduces two inferential statistical tests.

Descriptive statistics generally characterizes or describes a set of data elements by graphically displaying the information or describing its central tendancies and how it is distributed.

Center sophomores generally spend several weeks reviewing this material and extending their study of statistics with 15 web-based lectures on probability and distributions with emphasis on the normal distribution. Those who wish to go further can study inferential statistics, and thus prepare for the AP Statistics Test. Our intent is for juniors to complete over half that curriculum.

Inferential statistics tries to infer information about a population
by using information gathered by sampling.

Statistics: The collection of methods used in planning an experiment
and analyzing data in order to draw accurate conclusions.

General Terms Used Throughout Statistics

Population: The complete set of data elements is termed the population.

The term population will vary widely with its application. Examples could be any of the following proper subsets: animals; primates; human beings; homo sapiens; U.S. citizens; who are high school students; attending the Math & Science Center; living in Berrien County; as freshmen (class of 2009); females; home school of Niles, with one younger sister.

Sample: A sample is a portion of a population selected for further analysis.

How samples are obtained or types of sampling will be studied in lesson 2. Most any of the examples above for population could serve as a sample for the next higher level data set.

Parameter: A parameter is a characteristic of the whole population.

Statistic: A statistic is a characteristic of a sample, presumably measurable.

The plural of statistic just above is the second basic meaning of statistics.

Assume there are 30 students in a particular statistics class, with 7 going to Niles High School. Since 7 is 23% of 30, we can say 23% go to Niles. The 23% represents a parameter (not a statistic) of the class because it is based on the entire population. If we assume this class is representative of all classes, and we treat these 7 students as a sample drawn from a larger population, then the 23% becomes a statistic.

Remember: Parameter is to Population as Statistic is to Sample.

Accuracy vs. Precision

The distinction between accuracy and precision, reviewed earlier in Numbers lesson 9, is very important.

Uses and Abuses of Statistics

Most of the time, samples are used to infer something (draw conclusions) about the population. If an experiment or study was done cautiously and results were interpretted without bias, then the conclusions would be accurate. However, occasionally the conclusions are inaccurate or inaccurately portrayed for the following reasons:
Statistics are often abused. Many examples could be added, (even books have been written) but it will be more instructive and fun to find them on your own.

Types of Data

A dictionary defines data as facts or figures from which conclusions may be drawn. Thus, technically, it is a collective, or plural noun. Some recent dictionaries acknowledge popular usage of the word data with a singular verb. However we intend to adhere to the traditional "English" teacher mentality in our grammar usage—sorry if "data are" just doesn't sound quite right! (My mother and step-mother were both English teachers, so clearly no offense is intended above.) Datum is the singular form of the noun data. Data can be classified as either numeric or nonnumeric. Specific terms are used as follows:
  1. Qualitative data are nonnumeric.

    {Poor, Fair, Good, Better, Best}, colors (ignoring any physical causes), and types of material {straw, sticks, bricks} are examples of qualitative data.

    Qualitative data are often termed catagorical data. Some books use the terms individual and variable to reference the objects and characteristics described by a set of data. They also stress the importance of exact definitions of these variables, including what units they are recorded in. The reason the data were collected is also important.

  2. Quantitative data are numeric.

    Quantitative data are further classified as either discrete or continuous.

The real numbers are continuous with no gaps or interruptions. Physically measureable quantities of length, volume, time, mass, etc. are generally considered continuous. At the physical level (microscopically), especially for mass, this may not be true, but for normal life situations is a valid assumption.

The structure and nature of data will greatly affect our choice of analysis method. By structure we are referring to the fact that, for example, the data might be pairs of measurements. Consider the legend of Galileo dropping weights from the leaning tower of Pisa. The times for each item would be paired with the mass (and surface area) of the item. Something which Galileo clearly did was measure the time it took a pendulum to swing with various amplitudes. Galileo Galilei is considered a founder of the experimental method. More on his life and adventures can be read in Numbers lesson 5.

Levels of Measurement

The experimental (scientific) method depends on physically measuring things. The concept of measurement has been developed in conjunction with the concepts of numbers and units of measurement. Statisticians categorize measurements according to levels. Each level corresponds to how this measurement can be treated mathematically.

  1. Nominal: Nominal data have no order and thus only gives names or labels to various categories.

  2. Ordinal: Ordinal data have order, but the interval between measurements is not meaningful.

  3. Interval: Interval data have meaningful intervals between measurements, but there is no true starting point (zero).

  4. Ratio: Ratio data have the highest level of measurement. Ratios between measurements as well as intervals are meaningful because there is a starting point (zero).

Nominal comes from the Latin root nomen meaning name. Nomenclature, nominative, and nominee are related words. Gender is nominal. (Gender is something you are born with, whereas sex is something you should get a license for.)

Example 1: Colors
To most people, the colors: black, brown, red, orange, yellow, green, blue, violet, gray, and white are just names of colors.

To an electronics student familiar with color-coded resistors, this data is in ascending order and thus represents at least ordinal data.

To a physicist, the colors: red, orange, yellow, green, blue, and violet correspond to specific wavelengths of light and would be an example of ratio data.

Example 2: Temperatures
What level of measurement a temperature is depends on which temperature scale is used.
Specific values: 0°C = 32°F = 273.15 K = 491.69°R     100°C = 212°F = 373.15 K = 671.67°R     -17.8°C = 0°F = 255.4 K = 459.67°R
where C refers to Celsius (or Centigrade before 1948); F refers to Fahrenheit; K refers to Kelvin; R refers to Rankine.

Only Kelvin and Rankine have true zeroes (starting point) and ratios can be found. Celsius and Fahrenheit are interval data; certainly order is important and intervals are meaningful. However, a 180° dashboard is not twice as hot as the 90° outside temperature (Fahrenheit assumed)! Rankine has the same size degree as Fahrenheit but is rarely used. To interconvert Fahrenheit and Celsius, see Numbers lesson 12. (Note that since 1967, the use of the degree symbol on tempertures Kelvin is no longer proper.)

Although ordinal data should not be used for calculations, it is not uncommon to find averages formed from data collected which represented Strongly Disagree, ..., Strongly Agree! Also, averages of nominal data (zip codes, social security numbers) is rather meaningless!

BACK HOMEWORK NO ACTIVITY CONTINUE