No Title

$next$ $up$ $previous$
Next: About this document ...

Density Histogram:
Histogram in which area, rather than height of bar, represents frequency.
- This allows proper representation of histograms with unequal interval widths.
- For a density histogram the bar height is the density of the bar: the relative frequency/(unit interval length).
The Notion of Probability
- Bernoulli trial
- Probability of success: limit of relative frequencies.
- Random phenomenon
- Trial, Event
- Probability of an event
Example 1: Roll Them Bones
- Random phenomenon: Toss a pair of dice.
- Events: Lots of possibilities. Consider two:
  - A={Roll a 7}
  - E={Roll an even number}
- Probability of an event:
  - $P(A)=\lim_{n \rightarrow \infty} N_n(A)/n$ , where N_n(A) is the number of times 7 comes up in n rolls.
  - $P(E)=\lim_{n \rightarrow \infty} N_n(E)/n$ , where N_n(E) is the number of times an even number comes up in n rolls.
Example 2: Hope the Plane Don't Crash
- Random phenomenon: A critical landing gear component on a commercial airliner is inspected once per week and replaced when it exhibits excessive wear.
- Events: Lots of possibilities. Consider two:
  - A={Lasts less than three weeks}
  - E={Lasts more than a year}
Probability of an event:
- $P(A)=\lim_{n \rightarrow \infty} N_n(A)/n$ , where N_n(A) is the number of times the part was replaced in less than three weeks in n replacements.
- $P(E)=\lim_{n \rightarrow \infty} N_n(E)/n$ , where N_n(E) is the number of times the part lasted more than a year in n replacements.
Some Set Theory
Events are sets of outcomes of a random phenomenon, so we can perform operations on events that we do to sets. Among these are intersections and unions. If two events have no outcomes in common, they are disjoint. Their intersection is the null event.
The Addition Rule of Probability If A and B are disjoint events,
$\begin{displaymath} P(A\cup B)=P(A)+P(B).\end{displaymath}$
(Reasoning: in n trials of the random phenomenon,
$\begin{displaymath} N_n(A\cup B)=N_n(A)+N_n(B).\end{displaymath}$
divide by n and take limits.) If each pair of A, B and C are disjoint, similar reasoning leads to
$\begin{displaymath} P(A\cup B\cup C)=P(A)+P(B)+P(C),\end{displaymath}$
and so on.
The Equally Likely Outcomes Rule If a random phenomenon has m outcomes, each with probability 1/m, and if E is any event consisting of k of those outcomes, then P(E)=k/m.
This follows from the addition rule of probability.
Independence
Two events are independent if knowing whether one occurs does not change the probability that the other occurs.
It can be shown that events A and B are independent if and only if the multiplication rule holds:
$\begin{displaymath} P(A\cap B)=P(A)P(B).\end{displaymath}$

We can define the notion of mutual independence of more than two events: see the book, p. 125.
Two trials of a random phenomenon are independent if any event from the first trial is independent of any event from the second trial.
Discrete Random Variables
- Discrete random variable
- Discrete distribution model
- Probability mass function
Expectation, variance and probability computations
Assume Y is a discrete random variable with probability mass function p_Y(y).
- Mean $\mu=E(Y)$ exists if
  $\begin{displaymath} \sum_y\mid y \mid p_Y(y) < \infty.\end{displaymath}$
  In this case,
  $\begin{displaymath} \mu=E(Y)=\sum_y y p_Y(y).\end{displaymath}$
- The variance, $\sigma^2$ exists if
  $\begin{displaymath} \sum_y y^2 p_Y(y) < \infty.\end{displaymath}$
  In this case,
  $\begin{displaymath} \sigma^2=\sum_y (y-\mu)^2 p_Y(y).\end{displaymath}$
  The standard deviation is the square root of the variance.
- Probabilities For any set A of real numbers,
  $\begin{displaymath} P(Y \in A)=\sum_{\{y \in A\}} p_Y(y).\end{displaymath}$
Example
A floor is constructed of square tiles of sides 1, 2 and 4 inches. The numbers of these tiles are in the ratios 24:2:1. You toss a pin on the floor at random. If the tip of the pin lands in a square of side length Y, you win $Y.
- Why is Y a discrete random variable?
- Find the distribution of Y.
- Find the expected value of Y.
- What is a fair admission price to charge someone to play this game?
Two discrete distribution models
- Bernoulli
  - Description/use: $Y \sim \mbox{Bernoulli}(p)$ is the number of successes in one trial with probability p of success.
  - Probability mass function:
    $\begin{displaymath} p_Y(y)=p^y(1-p)^{1-y},\; y=0,1.\end{displaymath}$
  - Mean, variance:
    $\begin{displaymath} \mu=p,\; \sigma^2=p(1-p).\end{displaymath}$
- Uniform
  - Description/use: $Y \sim \mbox{Uniform}(k)$ is the value selected ``at random'' from the integers $1,2, \ldots, k$ .
  - Probability mass function:
    $\begin{displaymath} p_Y(y)=\frac{1}{k},\; y=1,2 \ldots, k.\end{displaymath}$
  - Mean, variance:
    $\begin{displaymath} \mu=\frac{k+1}{2},\; \sigma^2=\frac{k^2-1}{12}.\end{displaymath}$
Randomness of a random variable
- The random variable Y refers to the act of taking the measurement. It is random.
- The observed value y refers to the value taken. It is not random.
- Example 1: Roll Them Bones Suppose the random variable Y is the total on the two dice. Then Y can take values 2, 3, 4, ... 12. Once the dice are rolled the observed value is y.
- Example 2: Hope the Plane Don't Crash Suppose the random variable Y is the number of weeks until the landing gear component is replaced. Then Y can take values 1, 2, 3, 4, ... Once the component is replaced, the number of weeks it lasted is y.
Displaying and summarizing discrete distribution models
- Probability histograms.
- Relation to density histograms.
- Mean, variance and standard deviation.
Some Rules for Means, Variances and Standard Deviations
- If X=aY+b, $\mu_X=a\mu_Y+b$ $, \sigma^2_X=a^2\sigma^2_Y$ , and $\sigma_X=\vert a\vert\sigma_Y$ .
- If $X=Y_1+Y_2+ \cdots + Y_n$ , then
  $\begin{displaymath} \mu_X=\mu_{Y_1}+\mu_{Y_2}+\cdots+\mu_{Y_n}.\end{displaymath}$
- If Y₁ and Y₂ are independent random variables for which variances are defined,
  - If Z=Y₁+Y₂, then $\sigma^2_{Z}=\sigma^2_{Y_1}+\sigma^2_{Y_2}$ .
  - If W=Y₁-Y₂, then $\sigma^2_{W}=\sigma^2_{Y_1}+\sigma^2_{Y_2}$ .
- If $Y_1, Y_2, \ldots, Y_n$ are independent random variables for which variances are defined, and if $U=Y_1+Y_2+ \cdots + Y_n$ , then
  $\begin{displaymath} \sigma^2_X=\sigma^2_{Y_1}+\sigma^2_{Y_2}+ \cdots + \sigma^2_{Y_n}.\end{displaymath}$
- If $\overline{Y}_n$ is the sample mean, defined by
  $\begin{displaymath} \overline{Y}_n=\frac{1}{n}(Y_1+Y_2+\cdots+Y_n),\end{displaymath}$
  where $Y_1, Y_2, \ldots, Y_n$ are independent random variables having the same distribution with mean $\mu$ and variance $\sigma^2$ , then
  $\begin{displaymath} \mu_{\overline{Y}_n}=\mu, ~\sigma^2_{\overline{Y}_n}=\frac{\... ... \mbox{ and } ~\sigma_{\overline{Y}_n}=\frac{\sigma}{\sqrt{n}}.\end{displaymath}$
The Binomial distribution model
- Binomial trial.
- Binomial random variable.
- Mean, variance and standard deviation.
Decision-making using the binomial distribution model

One stage of a manufacturing process involves a manually-controlled grinding operation. Management suspects that the grinding machine operators tend to grind parts slightly larger rather than slightly smaller than the target diameter, while still staying within specification limits. To verify their suspicions, they sample 150 within-spec parts and find that 93 have diameters above the target diameter. Is this strong evidence in support of their suspicions?
SOLUTION: Suppose that there is no tendency to grind to larger or smaller diameters than the target diameter. Then the number of the 150 parts, Y, having diameters larger than the target diameter will have a b(150,0.5) distribution. In this case, the probability of finding 93 or more parts with diameters larger than the target diameter is
Thus, if there is no tendency to grind to larger or smaller diameters, they would observe as many as 93 of 150 sampled parts having diameters greater than the target in only 21 of 10000 samples.
The Poisson distribution model
- Description/use: represents the number of occurrences of some phenomenon in a fixed period of time, area or space. is the mean number of occurrences in that period of time, area or space. It is used in two settings"
  - As an approximation to the b(n,p) model when n is large and p is small. Take $\lambda=np$ .
  - In its own right following criteria P.1-P.3 on p. 147.
- Probability mass function:
  $\begin{displaymath} p_Y(y)=\frac{\lambda^{y}}{y!}e^{-\lambda},\;y=0,1,2,\ldots,\end{displaymath}$
- Mean, variance:
  $\begin{displaymath} \mu=\lambda,\; \sigma^2=\lambda.\end{displaymath}$
EXAMPLE On May 1 of this year, an electric utility put 250 new transformers in service. If the probability a transformer fails within one year is 0.008, approximate the probability fewer than three of the transformers fail within one year.
SOLUTION: The number of transformers that fail within one year is $Y\sim b(250,0.008)$ . Since 250>100, 0.008<0.01 and $250\times 0.008=2$ ,we approximate the probability using the Poisson distribution with $\lambda=2$ . From Table A.2, p.346, we have $P(Y\le 2)=0.677$ .
The Power of Models
- Quantifiers of data
- Extend range of conclusions
Continuous Random Variables
- Continuous random variable
- Continuous distribution model
- Probability density function
Test your understanding
A distribution model for Y, the continuous random variable describing the lifetime, in hours, of an electronic component, has a density function p_Y(y). Since the lifetimes are counted only for ``burned-in'' components: those that have lasted at least 24 hours, $Y\ge 24$ . It is known that the probability that the component lasts approximately y₁ hours, relative to the probability that it lasts approximately y₂ hours equals (y₂/y₁)².
- Find p_Y(y).
- What proportion of all components last between 36 and 48 hours?
- What is the expected value of Y?
Three continuous distribution models
- Uniform
  - Description/use: $Y \sim U(a,b)$ is the value selected ``at random'' from the interval (a,b).
  - Probability density function:
    
    p_Y(y) = $\frac{1}{(b-a)}$ , if a<y<b
    
    = 0, otherwise.
  - Mean, variance:
    $\begin{displaymath} \mu=\frac{a+b}{2},\; \sigma^2=\frac{(b-a)^2}{12}.\end{displaymath}$
- Normal
  - Description/use: $Y \sim N(\mu,\sigma^2)$ is described by the ``bell curve''.
  - Probability density function:
    $\begin{displaymath} p_Y(y) = \frac{1}{\sigma \sqrt {2 \pi}} e^{-\frac{1}{2} ( \frac{y-\mu}{\sigma})^{2}},-\infty<y<\infty \end{displaymath}$
  - Mean, variance:
    $\begin{displaymath} \mu=\mu,\; \sigma^2=\sigma^2.\end{displaymath}$
- Weibull
  - Description/use: $Y \sim W(\alpha,\beta)$ is used as a model of the time to failure of the first of a number of components.
  - Probability density function:
    
    p_Y(y) = $\frac{\alpha}{\beta^\alpha}y^{\alpha-1}e^{-(\frac{y}{\beta})^\alpha}, y\gt$ .
    
    = 0, otherwise.
  - Mean, variance (when $\alpha$ =1):
    $\begin{displaymath} \mu=\beta,\; \sigma^2=\beta^2.\end{displaymath}$
Expectation, variance and probability computations
Assume Y is a continuous random variable with density p_Y(y).
- Mean $\mu=E(Y)$ exists if
  $\begin{displaymath} \int_{-\infty}^\infty\mid y \mid p_Y(y) dy < \infty.\end{displaymath}$
  In this case,
  $\begin{displaymath} \mu=E(Y)=\int_{-\infty}^\infty y p_Y(y) dy.\end{displaymath}$
- Variance and standard deviation $\sigma^2$ exists if
  $\begin{displaymath} \int_{-\infty}^\infty y^2 p_Y(y) dy < \infty.\end{displaymath}$
  In this case,
  $\begin{displaymath} \sigma^2=\int_{-\infty}^\infty (y-\mu)^2 p_Y(y) dy.\end{displaymath}$
- Probabilities For any set A of real numbers,
  $\begin{displaymath} P(Y \in A)=\int_A p_Y(y) dy.\end{displaymath}$
Computing Normal Probabilities
All probabilities from any normal distribution can be reduced to probabilities from a standard normal (i.e. N(0,1)) distribution. Specifically, if $Y \sim N(\mu,\sigma^2)$ , then
$\begin{displaymath} Z=\frac{Y-\mu}{\sigma}\sim N(0,1).\end{displaymath}$
So,
$\begin{displaymath} P(a<Y<b)=P(\frac{a-\mu}{\sigma}<Z<\frac{b-\mu}{\sigma}).\end{displaymath}$
The Central Limit Theorem
The Central Limit Theorem (CLT) is the most important theorem in statistics. In words, it says:
As long as the population standard deviation is finite, the distribution of the mean (or sum) of independently chosen data from that population gets closer and closer to a normal distribution as the sample size increases.
Mathematical Statement of the Central Limit Theorem
Suppose that $Y_1, Y_2, \ldots$ are independent random variables having a distribution with mean $\mu$ and variance $\sigma^2<\infty$ . Let
$\begin{displaymath} \overline{Y}_n=\frac{1}{n}\sum_{i=1}^n Y_i\end{displaymath}$
be the mean of the first n random variables. Let Z_n be the standardized mean: $Z_n=\frac{\overline{Y}_n-\mu}{\sigma/\sqrt{n}}$ (Recall that $\overline{Y}_n$ has mean $\mu$ and variance $\sigma^2/n$ .) Then
$\begin{displaymath} \lim_{n\rightarrow \infty}P(a<Z_n<b)=\int_a^b\frac{1}{\sqrt{2 \pi}} e^{\frac{-z^2}{2}} dz.\end{displaymath}$
That is, as n gets larger, the distribution of Z_n gets closer and closer to a N(0,1).
The Normal Approximation to the Binomial Distribution
First note that by multiplying both numerator and denominator by n, we can write
$\begin{displaymath} Z_n=\frac{\overline{Y}_n-\mu}{\sigma/\sqrt{n}}= \frac{\sum_{i=1}^nY_i-n\mu}{\sigma\sqrt{n}}.\end{displaymath}$

Next, note that if $Y\sim b(n,p)$ , we can write $Y=\sum_{i=1}^nY_i$ , where $Y_1, Y_2, \ldots Y_n$ are independent Bernoulli(p) random variables. Since the mean and standard deviation of the Y_i are p and $\sqrt{p(1-p)}$ , respectively, if n is large enough, the CLT says that
$\begin{displaymath} Z_n=\frac{Y-np}{\sqrt{np(1-p)}}\end{displaymath}$
has approximately a N(0,1) distribution.
A Better Normal Approximation to the Binomial Distribution
The continuity correction can make the CLT approximation to the binomial more accurate. The continuity correction consists of adding or subtracting 0.5 from the endpoints of the interval:
$\begin{displaymath} P(k\leq Y \leq m)\end{displaymath}$

$\begin{displaymath} = P(k-0.5\leq Y \leq m+0.5)\end{displaymath}$

$\begin{displaymath} =P\left(\frac{k-0.5-np}{\sqrt{np(1-p)}}\leq \frac{Y-np}{\sqrt{np(1-p)}}\leq \frac{m+0.5-np}{\sqrt{np(1-p)}}\right)\end{displaymath}$

$\begin{displaymath} \approx P\left(\frac{k-0.5-np}{\sqrt{np(1-p)}}\leq Z\leq \frac{m+0.5-np}{\sqrt{np(1-p)}}\right),\end{displaymath}$
where $Z\sim N(0,1)$ .
EXAMPLE: Recall the following problem:

One stage of a manufacturing process involves a manually-controlled grinding operation. Management suspects that the grinding machine operators tend to grind parts slightly larger rather than slightly smaller than the target diameter, while still staying within specification limits. To verify their suspicions, they sample 150 within-spec parts and find that 93 have diameters above the target diameter. Is this strong evidence in support of their suspicions?

And its solution:
SOLUTION: Suppose that there is no tendency to grind to larger or smaller diameters than the target diameter. Then the number of the 150 parts, Y, having diameters larger than the target diameter will have a b(150,0.5) distribution. In this case, the probability of finding 93 or more parts with diameters larger than the target diameter is
Thus, if there is no tendency to grind to larger or smaller diameters, they would observe as many as 93 of 150 sampled parts having diameters greater than the target in only 21 of 10000 samples.
We will use the CLT with the continuity correction to approximate $P(Y\geq 93)$ . By assumption, p=0.5, so
$\begin{displaymath} P(Y\geq 93)=\end{displaymath}$

$\begin{displaymath} P\left(\frac{Y-0.5-(150)(0.5)}{\sqrt{(150)(0.5)(1-0.5)}}\geq \frac{93-0.5-(150)(0.5)}{\sqrt{(150)(0.5)(1-0.5)}}\right)\approx\end{displaymath}$

$\begin{displaymath} P(Z\geq 2.86)=0.0021,\end{displaymath}$
which equals the exact value to four decimal places. Note: if we don't use the continuity correction, the CLT approximation gives an approximate probability of 0.0012, not nearly as close.
Assessing Normality
A quick and simple check: 68-95-99.7 rule.
Identifying Common Distributions
A Q-Q plot is a plot to decide if it is reasonable to assume a set of data are drawn from a known distribution model (called a candidate distribution model). Suppose Y is a random variable from the candidate distribution model. Then we construct a Q-Q plot as follows:

1.
Order the observations $y_{(1)}\leq y_{(2)} \leq \ldots \leq y_{(n)}$ .
2.
For each observation compute a quantile rank with respect to the the candidate distribution model. For the k^th smallest observation, y_(k), the quantile rank is the value q_(k) satisfying
$\begin{displaymath} P(Y \leq q_{(k)})=\frac{k-0.375}{n+0.250}.\end{displaymath}$

3.
Plot the pairs $(y_{(k)},q_{(k)}),\; k=1,\ldots,n$ .
Transformations to Normality
- If the data are positive and skewed to the right, $\ln(Y)$ or $\sqrt{Y}$ should look more normal.
- If the data vary by more than 1 or 2 orders of magnitude, try analyzing $\ln(Y)$ , for positive data, or -1/Y.
- If the data consist of counts, try analyzing $\sqrt{Y}$ .
- If the data are proportions and the ratio of the largest to smallest proportion exceeds 2, try 1the logit transformation:
  $\ln(Y/(1-Y))$ .