Given a set of data points (x_{1},y_{1}), (x_{2},y_{2}), (x_{3},y_{3}),..., (x_{N},y_{N}), on a graph, find the straight line that best fits these points. The least-squares line or regression line can be found in the form of y = mx + b using the following formulas.
N represents the number of data points. The symbol represent the sum of all the x -coordinates of the data points. The symbol represent the sum of all the y -coordinates of the data points. The symbol represents the sum of the products of the coordinates of the data points. The represents the sum of the squares of the x-coordinates of the data points.
Look at this example:
Given the data points {(0,1), (2,3), (4,5)}
= 0 + 2 + 4 = 6, = 1 + 3 + 5 = 9
= 0(1) + 2(3) + 4(5) = 0 + 6 + 20 = 26
= 0^{2} + 2^{2} + 4^{2} = 0 + 4 + 16 = 20
Let's continue... Remember this
formulas?
Just plug in the appropriate values.
y = mx + b
The linear equation is y = 1x + 1
Lets do problem a problem to determine least-squares error
(amount of error between equation and actual data points)
The total error in approximating the data points (x_{1},y_{1}), (x_{2},y_{2}), (x_{3},y_{3}),..., (x_{N},y_{N}) by the line y = mx + b is usually measured by the sum .
Given the line y = -2x + 12
Data point | Point on line | Vertical Distance |
(1, 11) |
If x = 1, y = -2(1) +12 = 10 we have (1, 10) |
E_{1}= 10 - 11 = -1 |
(2, 7) |
If x = 2, y = -2(2) +12 = 8 we have (2, 8) |
E_{2}= 8 - 7 = 1 |
(3, 5) |
If x = 3, y = -2(3) +12 = 6 we have (3, 6) |
E_{3}= 6 - 5 = 1 |
(4, 5) |
If x = 4, y = -2(4) +12 = 4 we have (4, 4) |
E_{4}= 4 - 5 = -1 |
E = (E_{1})^{2 }+ (E_{2})^{2 }+ (E_{3})^{2 }+ (E_{4})^{2 }= (-1)^{2} + 1^{2} + 1^{2} + (-1)^{2 }= 1 + 1 + 1 + 1 = 4 |