The SAS data set SASDATA.STATOPOLIS contains information on 100,000 households.1 For the purposes of this lab, these 100,000 households will constitute the population.
The density histogram that results is the population histogram: there is one bar for each value of HHSIZE in the population, and the area of the bar is the proportion of the population that takes that value. Clicking on a bar will display its height, which here equals its area, since the intervals are 1 unit wide. (For instance the height/area of the bar for HHSIZE=5 is 0.0971). By holding down the control key and clicking on each bar in turn, the histogram will display the heights of all bars. Do this now, then print or save the histogram for later use in this lab and for your lab report.2
Leave this distribution analysis window open (you will be returning to it later) and go to C.
It may seem surprising, but even experts do not all agree on a single meaning of probability. We will focus on the two meanings that are most often used. These are probability as a population proportion, and probability as a long run proportion.
We will consider the discrete and continuous cases separately.
Consider again the STATOPOLIS population. How can we define the probability that a household has five members? One meaningful way is to define it as the proportion of all five member households in the population. In the STATOPOLIS population, 9,711 of the 100,000 households have five members, so the probability that a household has five members is .
In Part I.B. you calculated the height/area of each bar in the HHSIZE density histogram. These numbers are population proportions of households of each size. A check of the histogram you produced should show its height/area to be 0.09711.
The probabilities that a household has 1, 2, 3, 4, 6, 7, and 8 members summarize the pattern of variation of household size in the population. This summary is called the distribution model of HHSIZE. In your lab report, indicate the interpretation of these probabilities as population proportions.
Consider again the H_INCOME population. Its distribution model is called a gamma distribution model with parameters and . The density curve for this gamma distribution is
The gamma distribution is common in probability and statistics, and probabilities involving it may be computed using the SAS macro NPROBS. Access this macro now and use it to compute the proportion of household incomes that, according to the model, lie between $15,000 and $50,000.3 In your lab report, indicate the interpretation of this probability as a population proportion.
Probability can also be interpreted as the limit of proportions in random samples taken from the population. To see how this works for the STATOPOLIS data, take random samples of sizes 50, 500 and 5000 by running the SAS macro LAB4_5. The samples will be written to the SAS data files SAMP50, SAMP500, and SAMP5000 in the WORK library. Open each now in SAS/INSIGHT.
Invoke a distribution analysis of HHSIZE, and alter the density histogram exactly as you did the population histogram in Part I.B. For each data set, click on the bars of the density histogram to calculate the proportion of HHSIZE values taking on each of the values 1 to 8. You will want to save or print each histogram. In your lab report, compare these proportions with the population values you computed in Part I.B. Do the sample values seem to be converging to the population values?
Now do the same with the proportion of H_INCOME values between 15000 and 50000. A good way to get these proportions for the three data sets is the following:
In your lab report, be sure to include the following:
This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.54)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 2 lab4_5.tex
The translation was initiated by Joseph D Petruccelli on 2001-12-04