|
Using IDL: Mathematics |
|
Given two n-element sample populations, X and Y, it is possible to quantify the degree of fit to a linear model using the correlation coefficient. The correlation coefficient, r, is a scalar quantity in the interval [-1.0, 1.0], and is defined as the ratio of the covariance of the sample populations to the product of their standard deviations.

or

The correlation coefficient is a direct measure of how well two sample populations vary jointly. A value of r = +1 or r = –1 indicates a perfect fit to a positive or negative linear model, respectively. A value of r close to +1 or –1 indicates a high degree of correlation and a good fit to a linear model. A value of r close to 0 indicates a poor fit to a linear model.
The following sample populations represent a perfect positive linear correlation.
X = [-8.1, 1.0, -14.3, 4.2, -10.1, 4.3, 6.3, 5.0, 15.1, -2.2] Y = [-9.8, -0.7, -16.0, 2.5, -11.8, 2.6, 4.6, 3.3, 13.4, -3.9] ;Compute the correlation coefficient of X and Y. PRINT, CORRELATE(X, Y)
IDL prints:
1.00000
The following sample populations represent a high negative linear correlation.
X = [ 1.8, -2.7, 0.7, -0.5, -1.3, -0.9, 0.6, -1.5, 2.5, 3.0] Y = [-4.7, 9.8, -3.7, 2.8, 5.1, 3.9, -3.6, 5.8, -7.3, -7.4] ;Compute the correlation coefficient of X and Y: PRINT, CORRELATE(X, Y)
IDL prints:
-0.979907
The following sample populations represent a poor linear correlation.
X = [-1.8, 0.1, -0.1, 1.9, 0.5, 1.1, 1.9, 0.3, -0.2, -1.0] Y = [ 1.5, -1.0, -0.6, 1.1, 0.7, -0.7, 1.1, -0.1, 0.6, -0.1] ;Compute the correlation coefficient of X and Y: PRINT, CORRELATE(X, Y)
IDL prints:
0.0322859
When interpreting the value of the correlation coefficient, it is important to remember the following two caveats:
The fundamental principles of correlation that apply to the linear model of two sample populations may be extended to the multiple-linear model. The degree of relationship between three or more sample populations may be quantified using the multiple correlation coefficient. The degree of relationship between two sample populations when the effects of all other sample populations are removed may be quantified using the partial correlation coefficient. Both of these coefficients are scalar quantities in the interval [0.0, 1.0]. A value of +1 indicates a perfect linear relationship between populations. A value close to +1 indicates a high degree of linear relationship between populations; whereas a value close to 0 indicates a poor linear relationship between populations. (Although a value of 0 indicates no linear relationship between populations, remember that there may be a nonlinear relationship.)
Define the independent (X) and dependent (Y) data.
X = [[0.477121, 2.0, 13.0], $ [0.477121, 5.0, 6.0], $ [0.301030, 5.0, 9.0], $ [0.000000, 7.0, 5.5], $ [0.602060, 3.0, 7.0], $ [0.698970, 2.0, 9.5], $ [0.301030, 2.0, 17.0], $ [0.477121, 5.0, 12.5], $ [0.698970, 2.0, 13.5], $ [0.000000, 3.0, 12.5], $ [0.602060, 4.0, 13.0], $ [0.301030, 6.0, 7.5], $ [0.301030, 2.0, 7.5], $ [0.698970, 3.0, 12.0], $ [0.000000, 4.0, 14.0], $ [0.698970, 6.0, 11.5], $ [0.301030, 2.0, 15.0], $ [0.602060, 6.0, 8.5], $ [0.477121, 7.0, 14.5], $ [0.000000, 5.0, 9.5]] Y = [97.682, 98.424, 101.435, 102.266, 97.067, 97.397, $ 99.481, 99.613, 96.901, 100.152, 98.797, 100.796, $ 98.750, 97.991, 100.007, 98.615, 100.225, 98.388, $ 98.937, 100.617]
Compute the multiple correlation of Y on the first column of X. The result should be 0.798816.
PRINT, M_CORRELATE(X[0,*], Y)
IDL prints:
0.798816
Compute the multiple correlation of Y on the first two columns of X. The result should be 0.875872.
PRINT, M_CORRELATE(X[0:1,*], Y)
IDL prints:
0.875872
Compute the multiple correlation of Y on all columns of X. The result should be 0.877197.
PRINT, M_CORRELATE(X, Y)
IDL prints:
0.877197 ;Define the five sample populations. X0 = [30, 26, 28, 33, 35, 29] X1 = [0.29, 0.33, 0.34, 0.30, 0.30, 0.35] X2 = [65, 60, 65, 70, 70, 60] X3 = [2700, 2850, 2800, 3100, 2750, 3050] Y = [37, 33, 32, 37, 36, 33]
Compute the partial correlation of X1 and Y with the effects of X0, X2 and X3 removed.
PRINT, P_CORRELATE(X1, Y, REFORM([X0,X2,X3], 3, N_ELEMENTS(X1)))
IDL prints:
0.996017
See Correlation Analysis (in the functional category Mathematics) for a brief description of IDL routines for computing correlations. Detailed information is available in the IDL Reference Guide.
IDL Online Help (March 06, 2007)