 Density Histogram:
Histogram in which area, rather than height
of bar, represents frequency.
 This allows proper representation of
histograms with unequal interval widths.
 For a density histogram the bar height is the
density of the bar: the relative frequency/(unit interval length).
 The Notion of Probability
 Bernoulli trial
 Probability of success: limit of relative
frequencies.
 Random phenomenon
 Trial, Event
 Probability of an event
 Example 1: Roll Them Bones
 Random phenomenon: Toss a pair of dice.
 Events: Lots of possibilities. Consider two:
 A={Roll a 7}
 E={Roll an even number}
 Probability of an event:
 , where N_{n}(A) is the number of times 7 comes up in n rolls.
 , where N_{n}(E) is the number of times an even number comes up in n rolls.
 Example 2: Hope the Plane Don't Crash
 Random phenomenon: A critical landing gear
component on a commercial airliner is inspected once per week
and replaced when it exhibits excessive wear.
 Events: Lots of possibilities. Consider two:
 A={Lasts less than three weeks}
 E={Lasts more than a year}
 Probability of an event:
 , where N_{n}(A) is the number of times the part was replaced in
less than three weeks in n replacements.
 , where N_{n}(E) is the number of times the part lasted more
than a year in n replacements.
 Some Set Theory
Events are sets of outcomes of a random phenomenon, so we
can perform operations on events that we do to sets. Among these are
intersections and unions. If two events have
no outcomes in common, they are disjoint. Their
intersection is the null event.
 The Addition Rule of Probability
If A and B are disjoint events,
(Reasoning: in n trials of the random phenomenon,
divide by n and take limits.)
If each pair of A, B and C are disjoint,
similar reasoning leads to
and so on.
 The Equally Likely Outcomes Rule
If a random phenomenon has m outcomes, each with probability
1/m, and if E is any event consisting of k of those outcomes,
then P(E)=k/m.
This follows from the addition rule of probability.
 Independence
Two events are independent if knowing whether one occurs
does not change the probability that the other occurs.
It can be shown that events A and B are independent if and only if
the multiplication rule holds:
We can define the notion of mutual independence of more than
two events: see the book, p. 125.
Two trials of a random phenomenon are
independent if any event from the first trial is
independent of any event from the second trial.
 Discrete Random Variables
 Discrete random variable
 Discrete distribution model
 Probability mass function
 Expectation, variance and probability computations
Assume Y is a discrete random variable with probability mass
function p_{Y}(y).
 Mean exists if
In this case,
 The variance, exists if
In this case,
The standard deviation is the square root of the variance.
 Probabilities For any set A of real
numbers,
 Example
A floor is constructed of square tiles of sides 1, 2 and 4 inches. The
numbers of these tiles are in the ratios 24:2:1. You toss a pin on the
floor at random. If the tip of the pin lands in a square of side
length Y, you win $Y.
 Why is Y a discrete random variable?
 Find the distribution of Y.
 Find the expected value of Y.
 What is a fair admission price to charge someone to play
this game?
 Two discrete distribution models
 Bernoulli
 Description/use: is the
number of successes in one trial with probability p of
success.
 Probability mass function:
 Mean, variance:
 Uniform
 Randomness of a random variable
 The random variable Y refers to the act of taking the measurement. It is random.
 The observed value y refers to the value taken. It is not random.
 Example 1: Roll Them Bones
Suppose the random variable Y is the total on the two dice. Then Y
can take values 2, 3, 4, ... 12. Once the dice are rolled the observed
value is y.
 Example 2: Hope the Plane Don't Crash
Suppose the random variable Y is the number of weeks until the
landing gear component is replaced. Then Y
can take values 1, 2, 3, 4, ... Once the component is replaced, the
number of weeks it lasted is y.
 Displaying and summarizing discrete
distribution models
 Probability histograms.
 Relation to density histograms.
 Mean, variance and standard deviation.
 Some Rules for Means, Variances and Standard Deviations
 If X=aY+b, , and .
 If , then
 If Y_{1} and Y_{2} are independent random variables for
which variances are defined,
 If Z=Y_{1}+Y_{2}, then
.
 If W=Y_{1}Y_{2}, then
.
 If are independent random
variables for which variances are defined, and if
, then
 If is the sample mean, defined by
where are independent random variables having the
same distribution with mean and variance , then
 The Binomial distribution model
 Binomial trial.
 Binomial random variable.
 Mean, variance and standard deviation.
 Decisionmaking using the binomial distribution
model
One stage of a manufacturing process involves a manuallycontrolled
grinding operation. Management suspects that the grinding machine
operators tend to grind parts slightly larger rather than
slightly smaller than the target diameter, while still staying within
specification limits.
To verify their suspicions, they sample 150 withinspec parts and find
that 93 have diameters above the target diameter. Is this strong
evidence in support of their suspicions?
 SOLUTION: Suppose that there is no tendency to
grind to larger or smaller diameters than the target diameter. Then
the number of the 150 parts, Y, having diameters larger than the
target diameter will have a b(150,0.5) distribution. In this case,
the probability of finding 93 or more parts with diameters larger than
the target diameter is
Thus, if there is no tendency
to grind to larger or smaller diameters, they would observe as many as
93 of 150 sampled parts having diameters greater than the target in
only 21 of 10000 samples.
 The Poisson distribution model
 EXAMPLE On May 1 of this year, an electric
utility put 250 new transformers in service. If the probability a
transformer fails within one year is 0.008, approximate the
probability fewer than three of the transformers fail within one year.
SOLUTION:
The number of transformers that fail within one year is . Since 250>100, 0.008<0.01 and ,we approximate the probability using the Poisson distribution with
. From Table A.2, p.346, we have .
 The Power of Models
 Quantifiers of data
 Extend range of conclusions
 Continuous Random Variables
 Continuous random variable
 Continuous distribution model
 Probability density function
 Test your understanding
A distribution model for Y, the continuous random variable
describing the lifetime, in hours, of an electronic component,
has a density function p_{Y}(y). Since the lifetimes are counted only
for ``burnedin'' components: those that have lasted at least 24
hours, . It is known that the
probability that the component lasts approximately y_{1} hours, relative to the
probability that it lasts approximately y_{2} hours equals (y_{2}/y_{1})^{2}.
 Find p_{Y}(y).
 What proportion of all components last between 36 and 48
hours?
 What is the expected value of Y?
 Three continuous distribution models
 Uniform
 Description/use: is the value selected ``at random'' from
the interval (a,b).
 Probability density function:
p_{Y}(y) 
= 
, if a<y<b 

= 
0, otherwise. 
 Mean, variance:
 Normal
 Description/use: is described by the ``bell curve''.
 Probability density function:
 Mean, variance:
 Weibull
 Expectation, variance and probability computations
Assume Y is a continuous random variable with density p_{Y}(y).
 Mean exists if
In this case,
 Variance and standard deviation
exists if
In this case,
 Probabilities For any set A of real
numbers,
 Computing Normal Probabilities
All probabilities from any normal distribution can be reduced to
probabilities from a standard normal (i.e. N(0,1))
distribution. Specifically, if , then
So,
 The Central Limit Theorem
The Central Limit Theorem (CLT) is the most important theorem in
statistics. In words, it says:
As long as the population standard deviation is finite, the
distribution of the mean (or sum) of independently chosen data
from that population gets closer and closer to
a normal distribution as the sample size increases.
 Mathematical Statement of the Central Limit
Theorem
Suppose that are independent random variables
having a distribution with mean and variance
. Let
be the mean of the first n random variables. Let Z_{n} be the
standardized mean:
(Recall that
has mean and variance .)
Then
That is, as n gets larger, the distribution
of Z_{n} gets closer and closer to a N(0,1).
 The Normal Approximation to the Binomial
Distribution
First note that by multiplying both numerator and denominator by n,
we can write
Next, note that if , we can write ,
where
are independent Bernoulli(p) random variables.
Since the mean and standard deviation of the Y_{i} are p and
, respectively, if n is large enough, the CLT says that
has approximately a N(0,1) distribution.
 A Better Normal Approximation to the Binomial Distribution
The continuity correction can make the CLT approximation to
the binomial more accurate. The continuity correction consists of adding
or subtracting 0.5 from the endpoints of the interval:
where .  EXAMPLE:
Recall the following problem:
One stage of a manufacturing process involves a manuallycontrolled
grinding operation. Management suspects that the grinding machine
operators tend to grind parts slightly larger rather than
slightly smaller than the target diameter, while still staying within
specification limits.
To verify their suspicions, they sample 150 withinspec parts and find
that 93 have diameters above the target diameter. Is this strong
evidence in support of their suspicions?
And its solution:
 SOLUTION: Suppose that there is no
tendency to grind to larger or smaller diameters than the
target diameter. Then the number of the 150 parts, Y, having
diameters larger than the target diameter will have a
b(150,0.5) distribution. In this case, the probability of
finding 93 or more parts with diameters larger than the target
diameter is
Thus, if there is no tendency
to grind to larger or smaller diameters, they would observe as many as
93 of 150 sampled parts having diameters greater than the target in
only 21 of 10000 samples.
 We will use the CLT with the continuity correction to
approximate . By assumption, p=0.5, so
which equals the exact value to four decimal places.
Note: if we don't use the continuity correction, the CLT approximation
gives an approximate probability of 0.0012, not nearly as close.
 Assessing Normality
A quick and simple check: 689599.7 rule.
 Identifying Common Distributions
A QQ plot is a plot to decide if it is reasonable to assume a set of
data are drawn from a known distribution model (called a candidate
distribution model). Suppose Y is a random variable from the
candidate distribution model. Then we construct a QQ plot as follows:
 1.
 Order the observations .
 2.
 For each observation compute a
quantile rank with respect to the the candidate distribution
model. For the k^{th} smallest observation, y_{(k)}, the quantile
rank is the value q_{(k)} satisfying
 3.
 Plot the pairs .
 Transformations to Normality