next up previous

Curve Fitting


Fitting a function to data collected in an experiment is a common problem in engineering and science. In this project, you will learn how linear algebra can be used to solve the problem of fitting data with polynomials.

Part 1 - Interpolation

This is the simplest example of curve fitting. Every student knows that two points determine a straight line and, given two distinct data points, you should have no trouble using the point-slope formula to find the straight line which passes through the two points. Less well-known, however, is that three distinct points determine a quadratic, four distinct points determine a cubic, and, in general, n+1 distinct data points determine a polynomial of degree n.

As a simple example, consider the three points (-1,7), (1,1), and (2,4). If we seek to find the values of a, b, and c so that the quadratic polynomial g(x) = ax2+bx+c goes through the three points then we get the following equations,

a-b+c = 7

a+b+c = 1

4a+2b+c = 4

which are obtained by substituting an x value into g and setting it equal to the y value. For example, the first equation comes from substituting x=-1, y=7 into the equation g(x) = y. Solving this set of linear equations gives the values a=2, b=-3, and c=2. You can easily check that the quadratic g(x)=2x2-3x+2 goes through the three points.

The polynomial of degree n which passes through n+1 data points is called an interpolating polynomial. An example of how to use Maple to compute interpolating polynomials is in the Getting started worksheet for this project.

Exercises for Part 1

Consider the following data on power consumption in homes, where x is the size of the home in square feet and y is the monthly power consumption in kilowatt-hours.
x, square feet 1350 1600 1980 2930
y, kilowatt-hours 1172 1493 1804 1954

Fit this data with a cubic polynomial, using x for the independent variable. Then use your result to estimate the energy consumption y for a house of area 2000 square feet.

Suppose you are given only two data points, say (-1,7) and (1,1). Can you find a quadratic polynomial that passes through these points? (The answer should be yes.) If so, is the quadratic unique? Discuss this, both from a geometric and linear algebraic viewpoint.

Part 2 - Least Squares Regression

In the first part of this project, we dealt with finding a polynomial of degree n that passes through n+1 distinct data points. In this part, we consider a different situation that is also of great practical importance.

Suppose that you know that the relationship between two variables is linear. If you collected experimental data and plotted the results, you would expect them to all lie on a straight line. However, because of experimental errors or other problems there is almost never a single line containg all of the data points. In this case, one has to somehow choose the straight line that ``best'' fits the data. In this part of the project, we describe a commonly-used technique, called least squares regression, for getting the ``best'' fit.

Suppose that you have n data points (xi, yi), for $i=1,\,n$,where $n \geq 2$. You want to find values of a and b so that a linear equation of the form y=ax+b ``best'' fits the data. The method of least squares starts by defining the following error function

E(a,b) = \sum_{i=1}^{n} (y_i -(a x_i +b))^2, \end{displaymath}

which adds up the sum of the squares of the vertical distances between the data points and the straight line. The values of a and b are determined by minimizing the value of E(a,b) over all values of the parameters a and b, using the standard procedure from calculus. That is, we take the partial derivatives of E with respect to a and b and set them equal to zero. This results in the following two equations.

\frac{\partial E}{\partial a} = \sum_{i=1}^{n} 2(y_i -(a x_i
+b))(-x_i) = 0 \end{displaymath}

\frac{\partial E}{\partial b} = \sum_{i=1}^{n} 2(y_i -(a x_i
+b))(-1) = 0 \end{displaymath}

Please make sure you understand where these equations come from, as one of the exercises asks you to explain.

The next step is to rearrange these two equations so that all terms involving a and b are on the left-hand side and all other terms are on the right-hand side. This produces the following two linear equations.

a \sum_{i=1}^{n} x_{i}^2 + b \sum_{i=1}^{n} x_{i} = \sum_{i=1}^{n}
x_{i} y_{i} \end{displaymath}

a \sum_{i=1}^{n} x_{i} + b n = \sum_{i=1}^{n} y_{i} \end{displaymath}

Note that a and b are the only unknowns in these equations. The other terms are all numbers which can be calculated from the data. An example of using Maple to do a least squares regression is in the Getting started worksheet for this project.

Exercises for Part 2

The electroencephalogram (EEG) is a device used to measure brain waves. Neurologists have found that the peak EEG frequency increases with age. Data from a particular study are given in the table below. Here x is the age of a child in years, and y is the average peak EEG frequency of the subjects in the study.
x, years y, hertz x, years y, hertz
2 5.33 10 7.28
3 5.75 11 7.06
4 5.80 12 7.60
5 5.60 13 7.45
6 6.00 14 8.23
7 5.78 15 8.50
8 5.90 16 9.38
9 6.23    
Find the least squares regression line for this data, and plot it along with the data. Do you think that the linear regression line is a good fit?

Interpolating polynomials generally give good results, if only a few data points are available, but they don't always work well if the number of data points gets bigger than three or four. Use the technique from the first part of this project to find the interpolating polynomial that fits the first five data points from the previous exercise. That is, use the data from years 2, 3, 4, 5, and 6. Then plot your interpolating polynomial on the same graph with the data points from the same years. How does this compare to the straight line you obtained above?

Sometimes you want to use a function you have fit to data to extrapolate beyond the data you have. This can work pretty well if you know something about the way your data behaves. Use the interpolating polynomial from the previous exercise to predict the value of the peak EEG frequency at 7 years. Does it agree with the data?

The data on peak EEG frequency as a function of age showed an approximately linear relationship. Use least squares to fit a straight line to the data for years 2, 3, 4, 5, and 6 and compare its prediction at age 7 to the data, and to the prediction of your intrpolating polynomial in the previous exercise.

About this document ...

This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split +0 project_template.tex.

The translation was initiated by William W. Farr on 9/7/1999

next up previous

William W. Farr