next up previous

Curve Fitting

Introduction

Fitting a function to data collected in an experiment is a common problem in engineering and science. In this project, you will learn how linear algebra can be used to solve the problem of fitting data with polynomials.

Part 1 - Interpolation

This is the simplest example of curve fitting. Every student knows that two points determine a straight line and, given two distinct data points, you should have no trouble using the point-slope formula to find the straight line which passes through the two points. Less well-known, however, is that three distinct points determine a quadratic, four distinct points determine a cubic, and, in general, n+1 distinct data points determine a polynomial of degree n.

As a simple example, consider the three points (-1,7), (1,1), and (2,4). If we seek to find the values of a, b, and c so that the quadratic polynomial g(x) = ax2+bx+c goes through the three points then we get the following equations,

a-b+c = 7

a+b+c = 1

4a+2b+c = 4

which are obtained by substituting an x value into g and setting it equal to the y value. For example, the first equation comes from substituting x=-1, y=7 into the equation g(x) = y. Solving this set of linear equations gives the values a=2, b=-3, and c=2. You can easily check that the quadratic g(x)=2x2-3x+2 goes through the three points.

The polynomial of degree n which passes through n+1 data points is called an interpolating polynomial. An example of how to use Maple to compute interpolating polynomials is in the Getting started worksheet for this project.

Exercises for Part 1

1.
Consider the following data on machine tool wear, where V is the cutting speed in feet per minute and Tw the time in minutes for a tool to wear enough that it is no longer usable.
V, feet per minute 400 600 800 1000
Tw, minutes 31.3 11.9 8.0 4.3

Fit this data with a cubic polynomial, using V for the independent variable. Then use your result to estimate the wear time Tw for a cutting speed of 900 feet per minute.

2.
Suppose you are given only two data points, say (-1,7) and (1,1). Can you find a quadratic polynomial that passes through these points? (The answer should be yes.) If so, is the quadratic unique? Discuss this, both from a geometric and linear algebraic viewpoint.

3.
Suppose you had three points and wanted to find a straight line that went through all three points. Is this ever possible? Discuss this, both from a geometric and linear algebraic viewpoint.

Part 2 - Least Squares Regression

In the first part of this project, we dealt with finding a polynomial of degree n that passes through n+1 distinct data points. In this part, we consider a different situation that is also of great practical importance.

Suppose that you know that the relationship between two variables is linear. If you collected experimental data and plotted the results, you would expect them to all lie on a straight line. However, because of experimental errors or other problems there is almost never a single line containg all of the data points. In this case, one has to somehow choose the straight line that ``best'' fits the data. In this part of the project, we describe a commonly-used technique, called least squares regression, for getting the ``best'' fit.

Suppose that you have n data points (xi, yi), for $i=1,\,n$,where $n \geq 2$. You want to find values of a and b so that a linear equation of the form y=ax+b ``best'' fits the data. The method of least squares starts by defining the following error function

\begin{displaymath}
E(a,b) = \sum_{i=1}^{n} (y_i -(a x_i +b))^2, \end{displaymath}

which adds up the sum of the squares of the vertical distances between the data points and the straight line. The values of a and b are determined by minimizing the value of E(a,b) over all values of the parameters a and b, using the standard procedure from calculus. That is, we take the partial derivatives of E with respect to a and b and set them equal to zero. This results in the following two equations.

\begin{displaymath}
\frac{\partial E}{\partial a} = \sum_{i=1}^{n} 2(y_i -(a x_i
+b))(-x_i) = 0 \end{displaymath}

\begin{displaymath}
\frac{\partial E}{\partial b} = \sum_{i=1}^{n} 2(y_i -(a x_i
+b))(-1) = 0 \end{displaymath}

Please make sure you understand where these equations come from, as one of the exercises asks you to explain.

The next step is to rearrange these two equations so that all terms involving a and b are on the left-hand side and all other terms are on the right-hand side. This produces the following two linear equations.

\begin{displaymath}
a \sum_{i=1}^{n} x_{i}^2 + b \sum_{i=1}^{n} x_{i} = \sum_{i=1}^{n}
x_{i} y_{i} \end{displaymath}

\begin{displaymath}
a \sum_{i=1}^{n} x_{i} + b n = \sum_{i=1}^{n} y_{i} \end{displaymath}

Note that a and b are the only unknowns in these equations. The other terms are all numbers which can be calculated from the data. An example of using Maple to do a least squares regression is in the Getting started worksheet for this project.

Exercises for Part 2

1.
Given the following data on fuel economy of trucks, find the least squares regression line.
Speed 50 mph 55 mph 60 mph 65 mph
Miles per gallon 5.41 5.02 4.59 4.08

2.
Explain where the equations for the partial derivatives of E with respect to a and b come from.

3.
Explain the steps between the partial derivative equations and the linear equations for a and b.

4.
The linear system for least squares regression appears to be meaningful even if only one data point is available. Investigate this, and explain why there is not actually a solution.

5.
If the formula for linear regression is used with only two data points, is the resulting straight line the same as the one you would get using the technique from part 1 of this project? Experiment first with the two points (-1,7) and (1,1) and then show in general that the linear regression line is the same as the interpolation line if only two points are used.

About this document ...

This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split +0 project_template.tex.

The translation was initiated by William W. Farr on 9/1/1998


next up previous

William W. Farr
9/1/1998