Next: About this document ...
- WARNING: Stationarity Required!
- Variable: Name of what is being counted, measured or
observed.
- Variable Types
- o Quantitative versus
Categorical
- o Discrete versus Continuous
- Displaying data distributions:
- o Bar chart for categorical data
- o Histogram for quantitative data
- Analyzing Frequency Histograms
- o Modality
- o Symmetry
- o Center
- o Spread
- o Pattern and deviations
- What Might Cause These Shapes?
- o Symmetric, unimodal: measurement errors;
homogeneous populations
- o Skewness: upper bound, no lower bound, or vice-versa
- o Short-tailed: mixture of process streams
- o Multi-modal: nonhomogeneous populations
Examples: Data sets TRANFORM, GEYSER1
- Summary measures of quantitative data: location
- o Mean: Average
- o Median: Halfway point
- o Mode: Location of the modal bar on a
frequency histogram.
- o Quartiles
- o Quantiles
- Summary measures of quantitative data: spread
- o Mean absolute deviation: Average distance
from mean
- o Standard deviation (RMS): Square root of
average squared distance from mean
- o Interquartile range (IQR): Range of middle 50% of
data.
- o Quartiles
- o Quantiles
- Outliers
- o Outliers are extermely
unrepresentative data.
- o Box-and whisker plots, based on the
five number summary, can help detect outliers.
- Resistant Summary Measures
- o Summary measures are resistant if they are
not seriously affected by outliers.
- o The median and IQR are resistant measures of location
and spread.
- o The mean and standard deviation are not resistant.
- o However, the trimmed mean and
Winsorized mean are resistant measures of
location.
Next: About this document ...
Joseph D Petruccelli
5/27/1998