next_inactive up previous

Lab 4.6: The Central Limit Theorem, Population, and Sample


To use sampling from a known population to illustrate the Central Limit Theorem.

Lab Procedure

The Population

NOTE: The required graph in Part I.B. of this lab is the same as that in Part I.C. of Lab 4.5. If you are doing both labs, and you have already produced the graph for Parts I.B. of Lab 4.5, you do not have to reproduce it for this lab. In your lab report, indicate where it is found in the Lab 4.5 lab report (e.g., ``Figure 1 of the Lab 4.5 lab report.'')

The SAS data set SASDATA.STATOPOLIS contains information on 100,000 households.1 For the purposes of this lab, these 100,000 households will constitute the population.

Open SASDATA.STATOPOLIS in SAS/INSIGHT now (Recall that to get into SAS/INSIGHT you choose Solutions: Analysis: Interactive Data Analysis from any of the main SAS windows). You will see that there are four variables in the data set:

HHSIZE: household size.
VALUEH: the value of the house.
H_INCOME: household income.
H_GENDER: gender of the head of the household (0=female, 1=male)

Do a distribution analysis on H_INCOME (by choosing Analyze: Distribution ( Y )).

Notice that the density histogram has many bars. H_INCOME takes so many different values, it is easier to model its distribution using a density curve. To see what such a curve might look like, select Curves: Kernel Density then click OK. Print or save this histogram with the density curve for your lab report.

Selecting Samples and Obtaining Data

The Central Limit Theorem states that under very broad assumptions (the existence of a finite population variance), the distribution of the mean of a random sample from a population will be asymptotically normal. The implication of this, which has been borne out in practice, is that the more observations used to compute a mean, the closer to normal the distribution of that mean will be.

In this lab, you will take several random samples of different sizes. You will compute the mean of the data in each sample, and then you will pool your results with those of others in the class. The result will be a data set consisting of means from at least 100 random samples.

Select the samples using the SAS macro LAB4_6.2 The macro will select one simple random sample each of sizes 5, 10, and 50. These will be placed in the SAS data sets WORK.SAMP5A, WORK.SAMP10A, and WORK.SAMP50A, respectively.

Open each of the three samples in SAS/INSIGHT. For each, compute the mean of H_INCOME. The easiest way to do this is to choose Analyze: Multivariate ( Y X ) and input H_INCOME as the the Y variable. The mean will appear in the Univariate Statistics box in the output window. Record two copies of the values of the three means: one for your lab report, and the other to give the TA. Next term, the values will be input to the SAS data set SASDATA.LAB4_6. Since this is a new lab, we have already done the work of creating the data set. Go to Part III to analyze it.

The Central Limit Theorem

To see the Central Limit Theorem at work, you must look at the distributions of the means in the data set SASDATA.LAB4_6 to see if the distributions become more and more normal as more data values go into them. Specifically, for the variable H_INCOME, the distribution of means for samples of size 50 should be more normal than the distribution of means for samples of size 10, the distribution of means for samples of size 10 should be more normal than the distribution of means for samples of size 5, and the distribution of means for samples of size 5 more normal than the original population distribution.

Open SASDATA.LAB4_6 in SAS/INSIGHT. This data set contains the means of H_INCOME submitted by the entire class. The values for the means of samples of sizes 5, 10 and 50 are found under the variable names MEAN5, MEAN10 and MEAN50. There is also a random sample of the original H_INCOME values for comparison. Obtain a distribution analysis on each of these variables (by choosing Analyze: Distribution ( Y )).

You are now going to check the normality of each. To do so, you need the following facts:

If a random sample of size $n$ is drawn from a population having mean $\mu$ and standard deviation $\sigma$, and if $\overline{Y}$ is the mean of the random sample, then $\overline{Y}$ has population mean $\mu$ and population standard deviation $\sigma/\sqrt{n}$.

The mean of the population of H_INCOME values is $\mu=57500$, and the standard deviation is $\sigma=37914.38$, both computed from the gamma density. Therefore, population mean of MEAN5 is 57500 and the population standard deviation is $37914.38/\sqrt{5}=16955.82$.

Obtain the population mean and standard deviation for both MEAN10 and MEAN50.

Now superimpose a $N(57500,16955.82^2)$ density curve on the density histogram of MEAN5. To do this, choose Curves: Parametric Density. From the resulting window,

Make sure the normal distribution is selected.

Set the Method: to Specification.

Set Mean/Theta: to 57500.

Set Sigma: to 16955.82.

Print or save the density histogram with the superimposed normal density curve for your lab report.

Repeat C. for H_INCOME, MEAN10 and MEAN50. Be sure to use the correct value of Sigma: for each.

As a second check on normality, produce a normal quantile plot for H_INCOME, MEAN5, MEAN10 and MEAN50. To do this for MEAN5, proceed as follows:

Choose Curves: QQ Ref Line.

From the resulting window, make sure Method: is set to Specification, Intercept is set to 57500, and Slope is set to 16955.8.2.

Print or save the normal quantile plot for your lab report.

Repeat E. for H_INCOME, MEAN10, and MEAN50.

Lab Report Checklist

In your lab report, be sure to include the following:

The graph from I.B. (if you are not already including it for Lab 4.5).

The values of the three sample means from samples of size 5, 10 and 50 (see Part II.B.).

The values of $\mu$ and $\sigma/\sqrt{n}$ for H_INCOME, MEAN5, MEAN10, and MEAN50 (see Part III.B.).

Density histograms with superimposed normal density curves for H_INCOME, MEAN5, MEAN10, and MEAN50 (see Part III.C.).

Normal quantile plots with superimposed reference lines for H_INCOME, MEAN5, MEAN10, and MEAN50 (see Part III.E.).

Analysis of how the results you have obtained demonstrate the Central Limit Theorem.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.54)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 2 lab4_6

The translation was initiated by Joseph D Petruccelli on 2001-12-04

next_inactive up previous
Joseph D Petruccelli 2001-12-04