|
|
|
Linear Regression |
|
Linear correlation |
|
Significance Tests |
|
Multiple regression |
|
|
|
|
|
In meteorology, we want to use a variable x to
predict another variable y. In this case, the independent variable x is
called the “predictor”. The dependent variable y is called the “predictand” |
|
|
|
|
We have N paired data point (xi, yi)
that we want to approximate their relationship with a linear regression: |
|
|
|
The errors produced by this linear approximation
can be estimated as: |
|
|
|
|
|
|
|
The least square linear fit chooses coefficients
a and b to produce a minimum value of the error Q. |
|
|
|
|
Coefficients a and b are chosen such that the
error Q is minimum: |
|
|
|
|
|
This leads to: |
|
|
|
|
|
|
|
Solve the above equations, we get the linear
regression coefficients: |
|
where |
|
|
|
|
|
|
|
|
|
|
|
R2-value measures the percentage of
variation in the values of the dependent variable that can be explained by
the variation in the independent variable. |
|
R2-value varies from 0 to 1. |
|
A value of 0.7654 means that 76.54% of the
variance in y can be explained by the changes in X. The remaining 23.46% of
the variation in y is presumed to be due to random variability. |
|
|
|
|
There are many ways to test the significance of
the regression coefficient. |
|
Some use t-test to test the hypothesis that b=0. |
|
The most useful way for the test the
significance of the regression is use the “analysis of variance” which
separates the total variance of the dependent variable into two independent
parts: variance accounted for by the linear regression and the error
variance. |
|
|
|
|
The quality of the linear regression can be
analyzed using the “Analysis of Variance”. |
|
The analysis separates the total variance of y (Sy2) into the
part that can be accounted for by the linear regression (b2Sx2)
and the part that can not be accounted for by the regression (Se2): |
|
Sy2 = b2Sx2 +
Se2 |
|
|
|
|
To calculate the total variance, we need to know
the “mean” è DOF=N-1 |
|
If we know the mean and the regression slope
(B), then the regression line is set è The DOF of the regressed variance is only 1
(the slope). |
|
The error variance is determined from the
difference between the total variance (with DOF = N-1) and the regressed
variance (DOF=1) è The DOF of the error variance = (N-1)-1=N-2. |
|
|
|
|
We then use F-statistics to test the ratio of
the variance explained by the regression and the variance not explained by
the regression: |
|
F = (b2Sx2/1)
/ (Se2/(N-2)) |
|
Select a X% confidence level |
|
H0: b = 0 |
|
(i.e., variation in y is not explained by the linear regression but |
|
rather by chance or fluctuations) |
|
H1:
b ¹ 0 |
|
Reject the null hypothesis at the a
significance level if F>Fa(1, N-2) |
|
|
|
|
|
One way to estimate the “badness of fit” is to
calculate the scatter: |
|
|
|
scatter Sscatter = |
|
|
|
The relation between the scatter to the line of
regression in the analysis of two variables is like the relation
between the standard deviation to
the mean in the analysis of one variable. |
|
If lines are drawn parallel to the line of
regression at distances equal to ± (Sscatter)0.5 above
and below the line, measured in the y direction, about 68% of the
observation should fall between the two lines. |
|
|
|
|
Linear Regression: Y = a + bX |
|
A dimensional
measurement of the linear relationship between X and Y. |
|
è How does Y change with one unit of X? |
|
|
|
Linear Correlation |
|
A non-dimensional
measurement of the linear relationship between X and Y. |
|
è How does Y change (in standard deviation)
with one standard deviation of X? |
|
|
|
|
The linear regression coefficient (b) depends on
the unit of measurement. |
|
If we want to have a non-dimensional measurement
of the association between two variables, we use the linear correlation
coefficient (r): |
|
|
|
|
|
|
|
|
|
|
Recall in the linear regression, we show that: |
|
|
|
|
|
|
|
|
|
|
|
We also know: |
|
|
|
|
|
|
|
It turns out that |
|
|
|
|
Suppose that the correlation coefficient between
sunspots and five-year mean global temperature is 0.5 ( r = 0.5 ). |
|
The fraction of the variance of 5-year mean
global temperature that is “explained” by sunspots is r2 = 0.25. |
|
The fraction of unexplained variance is 0.75. |
|
|
|
|
|
|
When the true correlation coefficient is zero (H0:
r=0 and H1: r¹0) |
|
Use
Student-t to test the significance of r |
|
|
|
and n = N-2 degree of freedom |
|
|
|
|
|
When the true correlation coefficient is not
expected to be zero |
|
We
can not use a symmetric normal distribution for the test. |
|
We
must use Fisher’s Z transformation to convert the distribution of r |
|
to a normal distribution: |
|
|
|
|
|
|
Suppose N = 21 and r = 0.8. Find the 95% confidence limits on r. |
|
Answer: |
|
Use
Fisher’s Z transformation: |
|
|
|
|
|
Find the
95% significance limits |
|
|
|
|
|
(3) Convert Z back to r |
|
|
|
|
|
(4) The 95% significance limits are: 0.56 < r
< 0.92 |
|
|
|
|
In a study of the correlation between the amount
of rainfall and the quality of air pollution removed, 9 observations were
made. The sample correlation coefficient is –0.9786. Test the null
hypothesis that there is no linear correlation between the variables. Use
0.05 level of significance. |
|
|
|
Answer: |
|
1. Ho: r = 0; H1: r¹ 0 |
|
2. a = 0.05 |
|
3. Use Fisher’s Z |
|
|
|
|
|
|
|
4. Z
< Z 0.025 (= -1.96) è Reject the null hypothesis |
|
|
|
|
|
|
|
|
We first convert r to Fisher’s Z statistics: |
|
|
|
|
|
We then assume a normal distribution for Z1-Z2
and use the |
|
z-statistic (not Fisher’s Z): |
|
|
|
|
If we want to regress y with more than one
variables (x1, x2, x3,…..xn): |
|
|
|
After perform the least-square fit and remove
means from all variables: |
|
|
|
|
|
Solve the following matrix to obtain the
regression coefficients: a1, a2, a3, a4,…..,
an: |
|
|
|
|
Fourier transform is an example of multiple
regression. In this case, the independent (predictor) variables are: |
|
|
|
|
|
These independent variables are orthogonal to
each other. That means: |
|
|
|
|
|
Therefore, all the off-diagonal terms are zero in the following
matrix: |
|
|
|
|
|
|
|
|
|
|
|
We can easily get: |
|
|
|
|
Very often, one predictor is a function of the
other predictors. |
|
It becomes an important question: How many
predictors do we need in order to make a good regression (or prediction)? |
|
Does increasing the number of the predictor
improve the regression (or prediction)? |
|
If too many predictors are used, some large
coefficients may be assigned to variables that are not really highly
correlated to the predictant (y). These coefficients are generated to help
the regression relation to fit y. |
|
To answer this question, we have to figure out
how fast (or slow) the “fraction of explained variance” increase with
additional number of predictors. |
|
|
|
|
As an example, we discuss the case of two
predictors for the multiple regression. |
|
We can repeat the derivation we perform for the
simple linear regression to find that the fraction of variance explained by
the 2-predictors regression (R) is: |
|
here r is
the correlation coefficient |
|
We can show that if r2y is smaller
than or equal to a “minimum useful correlation” value, it is not useful to
include the second predictor in the regression. |
|
The minimum useful correlation = r1y
* r12 |
|
This is the minimum correlation of x2 with
y that is required to improve the R2 given that x2 is
correlated with x1. |
|
|
|
|
|
|
For a 2-predictor case: r1y = r2y
= r12 = 0.50 |
|
If
only include one predictor (x1) (r2y = r12 =0)
è R2=0.25 |
|
By
adding x2 in the regression (r2y = r12 =0.50)
è R2=0.33 |
|
In
this case, the 2nd predictor improve the regression. |
|
|
|
For a 2-predictor case: r1y = r12
= 0.50 but r2y = 0.25 |
|
If
with only x1 è R2=025 |
|
Adding x2 è R2=025
(still the same!!) |
|
In
this case, the 2nd predictor is not useful. It is because |
|
r2y
£ r1y * r12 = 0.50*0.50 = 0.25 |
|
|
|
|
|
|
|
|
|
|
Based on the previous analysis, we wish to use
predictors that are independent of each other |
|
è r12
= 0 |
|
è minimum
useful correlation = 0. |
|
The worst predictors are r12 = 1.0 |
|
The desire for independent predictors is part of
the motivation for Empirical Orthogonal Function (EOF) analysis. |
|
EOF attempts to find a relatively small number
of independent quantities which
convey as much of the original information as possible without redundancy. |
|