Editors' Choice
Great books about your topic, Statistics, selected by Encarta editors
Related Items
Encarta Search
Search Encarta about Statistics

Windows Live® Search Results

  • UK Statistics Authority

    UK Government Agency that produces and disseminates social, health, economic, demographic, labour market and business statistics. Includes StatBase - a fully searchable database of ...

  • National Statistics Online

    Free access to data produced by the Office for National Statistics, and some data from government departments and devolved administrations.

  • Check Browser Settings

    Neighbourhood Statistics - page to detect browser settings.

See all search results in
Windows Live® Search Results

Statistics

Encyclopedia Article
Multimedia
Mean, Mode, and Standard DeviationMean, Mode, and Standard Deviation
Article Outline
I

Introduction

Statistics, branch of mathematics that deals with the collection, organization, and analysis of numerical data and with such problems as experiment design and decision-making.

II

History

Simple forms of statistics have been used since the beginning of civilization, when pictorial representations or other symbols were used to record numbers of people, animals, and inanimate objects on skins, slabs, sticks of wood, or the walls of caves. Before 3000 bc the Babylonians used small clay tablets to record tabulations of agricultural yields and of commodities bartered or sold. The Egyptians analysed the population and material wealth of their country before beginning to build the pyramids in the 31st century bc. The biblical books of Numbers and 1 Chronicles are, in small parts, statistical works, the former containing two separate censuses of the Israelites and the latter describing the material wealth of various Jewish tribes. Similar numerical records existed in China before 2000 bc. The ancient Greeks held censuses to be used as bases for taxation as early as 594 bc.

The Roman Empire was the first government to gather extensive data about the population, area, and wealth of the territories that it controlled. During the Middle Ages in Europe few comprehensive censuses were made. The Carolingian kings Pepin the Short and Charlemagne ordered surveys of ecclesiastical holdings: Pepin in 758 and Charlemagne in 762. Following the Norman Conquest of England in 1066, William I, King of England, ordered a census to be taken; the information gathered in this census, conducted in 1086, was recorded in the Domesday Book. Registration of deaths and births was begun in England in the early 16th century, and in 1662 the first noteworthy statistical study of population, Observations on the London Bills of Mortality, was written. A similar study of mortality made in Breslau, Germany, in 1691, was used by the English astronomer Edmond Halley as a basis for the earliest mortality table. In the 19th century, with the application of the scientific method to all phenomena in the natural and social sciences, investigators recognized the need to reduce information to numerical values to avoid the ambiguity of verbal description.

At present, statistics is a reliable means of describing accurately the values of economic, political, social, psychological, biological, and physical data and serves as a tool to correlate and analyse such data. The work of the statistician is no longer confined to gathering and tabulating data, but is chiefly a process of interpreting the information. The development of the theory of probability increased the scope of statistical applications. Much data can be approximated accurately by certain probability distributions, and the results of probability distributions can be used in analysing statistical data. Probability can be used to test the reliability of statistical inferences and to indicate the kind and amount of data required for a particular problem.

III

Statistical Methods

The raw materials of statistics are sets of numbers obtained from counting or measuring things. In collecting statistical data, adequate precautions must be taken to secure complete and accurate information.

The first problem of the statistician is to determine what and how much data to collect. Actually, the problem of the census-taker in obtaining an accurate and complete count of the population, like the problem of the physicist who wishes to count the number of molecule collisions per second in a gas, is to decide the precise nature of the items to be counted. The statistician faces a complex problem when, for example, he or she wishes to take a sample poll or straw vote. It is no simple matter to select a sample that will yield accurate predictions concerning the total population.

To establish a physical, biological, or social law, the statistician may start with one set of data and gradually modify it in the light of experience. For example, in early studies of the growth of populations, change in size of population was predicted by calculating the excess of births over deaths in any given period. Population statisticians soon recognized that rate of increase ultimately depends on the number of births, regardless of the number of deaths, so they began to calculate population growth on the basis of the number of births each year per 1,000 population. When predictions based on this method yielded inaccurate results, statisticians realized that other limiting factors exist in population growth. Because the number of births possible depends on the number of women rather than the total population, and because women bear children during only part of their total lifetime, the basic datum used to calculate population size is now the number of live births per 1,000 females of child-bearing age. The predictive value of this basic datum can be further refined by combining it with other data on the percentage of women who remain childless. The excess of births over deaths, therefore, is meaningful only as an indication of gross population growth over a definite period in the past; the number of births per 1,000 population is meaningful only as an expression of the proportion of increase during a similar period; and the number of live births per 1,000 women of child-bearing age is meaningful for predicting the future size of populations.

IV

Tabulation and Presentation of Data

The collected data must be arranged, tabulated, and presented to permit ready and meaningful analysis and interpretation. To study and interpret the examination-grade distribution in a class of 30 pupils, for instance, the grades are initially arranged in ascending order: 30, 35, 43, 52, 61, 65, 65, 65, 68, 70, 72, 72, 73, 75, 75, 76, 77, 78, 78, 80, 83, 85, 88, 88, 90, 91, 96, 97, 100, 100. This progression shows at a glance that the maximum is 100, the minimum 30, and the range, or difference, between the maximum and minimum is 70.

In a cumulative-frequency graph, such as figure 1, the grades are marked on the horizontal axis, and the vertical axis is marked with the cumulative number of grades on the left and the corresponding percentage of the total number on the right. Each dot represents the accumulated number of students who have attained a particular grade or less. For example, the dot A corresponds to the second 72; reading on the vertical axis, it is evident that there are 12, or 40 per cent, of the grades equal to or less than 72.

In analysing the grades received by ten classes of 30 pupils each on four examinations, a total of 1,200 grades, the amount of data is too large to be exhibited conveniently as in figure 1. The statistician separates the data into suitably chosen groups, or intervals. For example, ten intervals might be used to tabulate the 1,200 grades, as in column (a) of the accompanying frequency-distribution table; the actual number in an interval, called the frequency of the interval, is entered in column (c). The numbers that define the interval range are called the interval boundaries. It is convenient to choose the interval boundaries so that the interval ranges are equal to each other, and the interval midpoints are simple numbers. A grade such as 87 will be tallied in the 80-90 interval; a boundary grade such as 90 may be tallied uniformly throughout the groups in either the lower or upper intervals. The relative frequency, column (d), is the ratio of the frequency of an interval to the total count. The cumulative frequency, column (e), represents the number of students receiving grades equal to or less than the range in each succeeding interval; thus, the number of students with grades of 30 or less is obtained by adding the frequencies in column (c) for the first three intervals, which total 53. The cumulative relative frequency, column (f), is the ratio of the cumulative frequency to the total number of grades.

The data of a frequency-distribution table can be presented graphically in a frequency histogram, as in figure 2, or a cumulative-frequency polygon, as in figure 3. The histogram is a series of rectangles with bases equal to the interval ranges and areas proportional to the frequencies. The polygon in figure 3 is drawn by connecting the interval midpoints of a cumulative frequency histogram with straight lines.

Newspapers and other printed media frequently present statistical data pictorially by using different lengths or sizes of various symbols to indicate different values.

Prev.
|
Next
Find in this article
View printer-friendly page
E-mail




© 2008 Microsoft