Part 2: Analysis of Relationship Between Two Variables
Linear Regression
Linear correlation
Significance Tests
Multiple regression

Linear Regression
Y = a X + b

Predictor and Predictand
In meteorology, we want to use a variable x to predict another variable y. In this case, the independent variable x is called the “predictor”. The dependent variable y is called the “predictand”

Linear Regression
We have N paired data point (xi, yi) that we want to approximate their relationship with a linear regression:
The errors produced by this linear approximation can be estimated as:
The least square linear fit chooses coefficients a and b to produce a minimum value of the error Q.

Least Square Fit
Coefficients a and b are chosen such that the error Q is minimum:
This leads to:
Solve the above equations, we get the linear regression coefficients:
                                                                    where

Major Assumptions of Linear Regression

Significance of the Regression Coefficients
The regression coefficients a and b are statistics derived from sample, not parameters from population.
The regression coefficients vary from one sample to another. We can not use normal statistical theory to predicted their variations.
It is because statistical theory derives its results by assuming the successive pair of observations (xi, yi) are independent. This is not true for geoscience variables.

How Good Is the Fit?
The quality of the linear regression can be analyzed using the “Analysis of Variance”.
The analysis separates the total variance of y  (Sy2) into the part that can be accounted for by the linear regression (b2Sx2) and the part that can not be accounted for by the regression (Se2):
            Sy2 = b2Sx2 + Se2

Analysis of Variance
We then use F-statistics to test the ratio of the variance explained by the regression and the variance not explained by the regression:
                       F = (b2Sx2/1) / (Se2/(N-2))
Select a X% confidence level
H0: b = 0
            (i.e., variation in y is not explained by the linear regression but
            rather by chance or fluctuations
     H1: b ¹ 0
Reject the null hypothesis at the a significance level if F>Fa(1, N-2)

Scattering
One way to estimate the “badness of fit” is to calculate the scatter:
     scatter  (Se)0.5
The relation between the scatter to the line of regression in the analysis of two variables is like the relation between  the standard deviation to the mean in the analysis of one variable.
If lines are drawn parallel to the line of regression at distances equal to ± (Se)0.5 above and below the line, measured in the y direction, about 68% of the observation should fall between the two lines.

Correlation and Regression
Linear Regression: Y = a + bX
    A dimensional measurement of the linear relationship between X and Y.
è How does Y change with one unit of X?
Linear Correlation
    A non-dimensional measurement of the linear relationship between X and Y.
è How does Y change (in standard deviation) with one standard deviation of X?

Linear Correlation
The linear regression coefficient (b) depends on the unit of measurement.
If we want to have a non-dimensional measurement of the association between two variables, we use the linear correlation coefficient (r):

Correlation and Regression
Recall in the linear regression, we show that:
We also know:
It turns out that

An Example
Suppose that the correlation coefficient between sunspots and five-year mean global temperature is 0.5 ( r = 0.5 ).
The fraction of the variance of 5-year mean global temperature that is “explained” by sunspots is r2 = 0.25.
The fraction of unexplained variance is 0.75.

Significance Test of Correlation Coefficient
When the true correlation coefficient is zero (H0: r=0 and H1: 0)
      Use Student-t to test the significance of r
                                 and  n = N-2 degree of freedom
When the true correlation coefficient is not expected to be zero
      We can not use a symmetric normal distribution for the test.
      We must use Fisher’s Z transformation to convert the distribution of r
            to a normal distribution:

An Example
Suppose N = 21 and r = 0.8.  Find the 95% confidence limits on r.
Answer:
 Use Fisher’s Z transformation:
 Find the 95% significance limits
(3) Convert Z back to r
(4) The 95% significance limits are: 0.56 < r < 0.92

Test of the Difference Between Two Non-Zero Coefficients
We first convert r to Fisher’s Z statistics:
We then assume a normal distribution for Z1-Z2 and use the
      z-statistic (not Fisher’s Z):

Multiple Regression
If we want to regress y with more than one variables (x1, x2, x3,…..xn):
After perform the least-square fit and remove means from all variables:
Solve the following matrix to obtain the regression coefficients: a1, a2, a3, a4,….., an:

Fourier Transform
Fourier transform is an example of multiple regression. In this case, the independent (predictor) variables are:
These independent variables are orthogonal to each other. That means:
     Therefore, all the off-diagonal terms are zero in the following matrix:
We can easily get:

How Many Predictors Are Needed?
Very often, one predictor is a function of the other predictors.
It becomes an important question: How many predictors do we need in order to make a good regression (or prediction)?
Does increasing the number of the predictor improve the regression (or prediction)?
If too many predictors are used, some large coefficients may be assigned to variables that are not really highly correlated to the predictant (y). These coefficients are generated to help the regression relation to fit y.
To answer this question, we have to figure out how fast (or slow) the “fraction of explained variance” increase with additional number of predictors.

Explained Variance for Multiple Regression
As an example, we discuss the case of two predictors for the multiple regression.
We can repeat the derivation we perform for the simple linear regression to find that the fraction of variance explained by the 2-predictors regression (R) is:
                                                    here r is the correlation coefficient
We can show that if r2y is smaller than or equal to a “minimum useful correlation” value, it is not useful to include the second predictor in the regression.
The minimum useful correlation = r1y * r12
This is the minimum correlation of x2 with y that is required to improve the R2 given that x2 is correlated with x1.

An Example
For a 2-predictor case: r1y = r2y = r12 = 0.50
     If only include one predictor (x1) (r2y = r12 =0) è R2=025
     By adding x2 in the regression (r2y = r12 =0.25) è R2=033
     In this case, the 2nd predictor improve the regression.
For a 2-predictor case: r1y = r12 = 0.50 but r2y = 0.25
     If with only x1 è R2=025
     Adding x2        è R2=025 (still the same!!)
     In this case, the 2nd predictor is not useful.  It is because
     r2y £ r1y * r12 = 0.50*0.50 = 0.25

Independent Predictors
Based on the previous analysis, we wish to use predictors that are independent of each other
     è r12 = 0
     è minimum useful correlation = 0.
The worst predictors are r12 = 1.0
The desire for independent predictors is part of the motivation for Empirical Orthogonal Function (EOF) analysis.
EOF attempts to find a relatively small number of  independent quantities which convey as much of the original information as possible without redundancy.