Part 2: Analysis of
Relationship Between Two Variables
|
|
|
Linear Regression |
|
Linear correlation |
|
Significance Tests |
|
Multiple regression |
Linear Regression
Predictor and Predictand
|
|
|
In meteorology, we want to use a
variable x to predict another variable y. In this case, the independent
variable x is called the “predictor”. The dependent variable y is called the
“predictand” |
Linear Regression
|
|
|
We have N paired data point (xi,
yi) that we want to approximate their relationship with a linear
regression: |
|
|
|
The errors produced by this linear
approximation can be estimated as: |
|
|
|
|
|
|
|
The least square linear fit chooses
coefficients a and b to produce a minimum value of the error Q. |
Least Square Fit
|
|
|
Coefficients a and b are chosen such
that the error Q is minimum: |
|
|
|
|
|
This leads to: |
|
|
|
|
|
|
|
Solve the above equations, we get the
linear regression coefficients: |
|
where |
|
|
|
|
|
|
Major Assumptions of
Linear Regression
Significance of the
Regression Coefficients
|
|
|
The regression coefficients a and b are
statistics derived from sample, not parameters from population. |
|
The regression coefficients vary from
one sample to another. We can not use normal statistical theory to predicted
their variations. |
|
It is because statistical theory
derives its results by assuming the successive pair of observations (xi,
yi) are independent. This is not true for geoscience variables. |
How Good Is the Fit?
|
|
|
The quality of the linear regression
can be analyzed using the “Analysis of Variance”. |
|
The analysis separates the total
variance of y (Sy2)
into the part that can be accounted for by the linear regression (b2Sx2)
and the part that can not be accounted for by the regression (Se2): |
|
Sy2 = b2Sx2
+ Se2 |
Analysis of Variance
|
|
|
We then use F-statistics to test the
ratio of the variance explained by the regression and the variance not
explained by the regression: |
|
F = (b2Sx2/1)
/ (Se2/(N-2)) |
|
Select a X% confidence level |
|
H0: b = 0 |
|
(i.e., variation in y is not explained by the linear
regression but |
|
rather by chance or fluctuations |
|
H1: b ¹ 0 |
|
Reject the null hypothesis at the a significance
level if F>Fa(1, N-2) |
Scattering
|
|
|
One way to estimate the “badness of
fit” is to calculate the scatter: |
|
|
|
scatter (Se)0.5 |
|
|
|
The relation between the scatter to the
line of regression in the analysis of two variables is like the relation
between the standard deviation to the
mean in the analysis of one variable. |
|
If lines are drawn parallel to the line
of regression at distances equal to ± (Se)0.5 above
and below the line, measured in the y direction, about 68% of the observation
should fall between the two lines. |
Correlation and
Regression
|
|
|
Linear Regression: Y = a + bX |
|
A dimensional measurement of the linear relationship between X and Y. |
|
è How does Y change with one unit of X? |
|
|
|
Linear Correlation |
|
A non-dimensional measurement of the linear relationship between X and
Y. |
|
è How does Y change (in standard deviation) with one standard deviation
of X? |
Linear Correlation
|
|
|
The linear regression coefficient (b)
depends on the unit of measurement. |
|
If we want to have a non-dimensional
measurement of the association between two variables, we use the linear
correlation coefficient (r): |
|
|
|
|
|
|
Correlation and
Regression
|
|
|
Recall in the linear regression, we
show that: |
|
|
|
|
|
|
|
|
|
|
|
We also know: |
|
|
|
|
|
|
|
It turns out that |
An Example
|
|
|
Suppose that the correlation
coefficient between sunspots and five-year mean global temperature is 0.5 ( r
= 0.5 ). |
|
The fraction of the variance of 5-year
mean global temperature that is “explained” by sunspots is r2 =
0.25. |
|
The fraction of unexplained variance is
0.75. |
|
|
Significance Test of
Correlation Coefficient
|
|
|
When the true correlation coefficient
is zero (H0: r=0 and H1:
r¹0) |
|
Use Student-t to test the significance of r |
|
|
|
and n = N-2 degree of
freedom |
|
|
|
|
|
When the true correlation coefficient is
not expected to be zero |
|
We can not use a symmetric normal distribution for the test. |
|
We must use Fisher’s Z transformation to convert the distribution of r |
|
to a normal distribution: |
|
|
An Example
|
|
|
Suppose N = 21 and r = 0.8. Find the 95% confidence limits on r. |
|
Answer: |
|
Use Fisher’s Z transformation: |
|
|
|
|
|
Find the 95% significance limits |
|
|
|
|
|
(3) Convert Z back to r |
|
|
|
|
|
(4) The 95% significance limits are:
0.56 < r < 0.92 |
Test of the Difference
Between Two Non-Zero Coefficients
|
|
|
We first convert r to Fisher’s Z
statistics: |
|
|
|
|
|
We then assume a normal distribution
for Z1-Z2 and use the |
|
z-statistic (not Fisher’s Z): |
Multiple Regression
|
|
|
If we want to regress y with more than
one variables (x1, x2, x3,…..xn): |
|
|
|
After perform the least-square fit and
remove means from all variables: |
|
|
|
|
|
Solve the following matrix to obtain
the regression coefficients: a1, a2, a3, a4,…..,
an: |
Fourier Transform
|
|
|
Fourier transform is an example of
multiple regression. In this case, the independent (predictor) variables are: |
|
|
|
|
|
These independent variables are
orthogonal to each other. That means: |
|
|
|
|
|
Therefore, all the off-diagonal terms are zero in the following
matrix: |
|
|
|
|
|
|
|
|
|
|
|
We can easily get: |
How Many Predictors Are
Needed?
|
|
|
Very often, one predictor is a function
of the other predictors. |
|
It becomes an important question: How
many predictors do we need in order to make a good regression (or
prediction)? |
|
Does increasing the number of the
predictor improve the regression (or prediction)? |
|
If too many predictors are used, some
large coefficients may be assigned to variables that are not really highly
correlated to the predictant (y). These coefficients are generated to help
the regression relation to fit y. |
|
To answer this question, we have to
figure out how fast (or slow) the “fraction of explained variance” increase
with additional number of predictors. |
Explained Variance for
Multiple Regression
|
|
|
As an example, we discuss the case of
two predictors for the multiple regression. |
|
We can repeat the derivation we perform
for the simple linear regression to find that the fraction of variance
explained by the 2-predictors regression (R) is: |
|
here r is the
correlation coefficient |
|
We can show that if r2y is
smaller than or equal to a “minimum useful correlation” value, it is not
useful to include the second predictor in the regression. |
|
The minimum useful correlation = r1y
* r12 |
|
This is the minimum correlation of x2
with y that is required to improve the R2 given that x2
is correlated with x1. |
|
|
An Example
|
|
|
For a 2-predictor case: r1y
= r2y = r12 = 0.50 |
|
If only include one predictor (x1) (r2y = r12 =0)
è R2=025 |
|
By adding x2 in the regression (r2y = r12 =0.25)
è R2=033 |
|
In this case, the 2nd predictor improve the regression. |
|
|
|
For a 2-predictor case: r1y
= r12 = 0.50 but r2y = 0.25 |
|
If with only x1 è R2=025 |
|
Adding x2 è R2=025 (still the same!!) |
|
In this case, the 2nd predictor is not useful. It is because |
|
r2y £ r1y
* r12 = 0.50*0.50 = 0.25 |
|
|
|
|
|
|
Independent Predictors
|
|
|
Based on the previous analysis, we wish
to use predictors that are independent of each other |
|
è r12 = 0 |
|
è minimum useful correlation = 0. |
|
The worst predictors are r12 =
1.0 |
|
The desire for independent predictors
is part of the motivation for Empirical Orthogonal Function (EOF) analysis. |
|
EOF attempts to find a relatively small
number of independent quantities
which convey as much of the original information as possible without
redundancy. |