# The Method of Least Squares

 The Problem The Formulas How it Works Least Squares Error

The Problem

Given a set of data points (x1,y1), (x2,y2), (x3,y3),..., (xN,yN), on a graph,  find the straight line that best fits these points.  The least-squares line or regression line can be found in the form of y = mx + b using the following formulas.

Get a real Browser

The Formulas

How it works

N  represents the number of data points.  The symbol represent the sum of all the x -coordinates of the data points.  The symbol represent the sum of all the y -coordinates of the data points.  The symbol represents the sum of the products of the coordinates of the data points.  The represents the sum of the squares of the x-coordinates of the data points.

Look at this example:

Given the data points {(0,1), (2,3), (4,5)}

= 0 + 2 + 4 = 6  = 1 + 3  + 5 = 9

= 0(1) + 2(3) + 4(5) = 0 + 6 + 20 = 26

= 02 + 22 + 42 = 0 + 4 + 16 = 20

Let's continue...  Remember this formulas?

Just plug in the appropriate values.

y = mx + b

The linear equation is  y = 1x + 1

Least Squares Error

Lets do problem a problem to determine least-squares error

(amount of error between equation and actual data points)

The total error in approximating the data points (x1,y1), (x2,y2), (x3,y3),..., (xN,yN) by the line y = mx + b is usually measured by the sum .

Given   the line y = -2x + 12

 Data point Point on line Vertical Distance (1, 11) If x = 1, y = -2(1) +12 = 10    we have (1, 10) E1= 10 - 11 = -1 (2, 7) If x = 2, y = -2(2) +12 = 8    we have (2, 8) E2= 8 - 7 = 1 (3, 5) If x = 3, y = -2(3) +12 = 6    we have (3, 6) E3= 6 - 5 = 1 (4, 5) If x = 4, y = -2(4) +12 = 4    we have (4, 4) E4= 4 - 5 = -1 E = (E1)2 +  (E2)2   +  (E3)2   +  (E4)2   = (-1)2  +  12 +  12 +  (-1)2  =  1 + 1 + 1 + 1 = 4