# Variance and Covariance

### Variance, correlation, and covariance review

#### Variance

\[Variance = \frac{\sum (x-\bar{x})^2}{n-1} = \sigma^2\]

`library(UsingR) x <- father.son$fheight y <- father.son$sheight v <- sum((x-mean(x))^2/(length(x)-1)) v`

`## [1] 7.534`

`v == sd(x)^2 & #equivalency of alternative definitions v == var(x) #equivalency of r code var()`

`## [1] TRUE`

#### Covariance

\[Covariance = \frac{\sum (x-\bar{x})(y-\bar{y})}{n-1}\]

`cv <- sum((x-mean(x))*(y-mean(y)))/(length(x)-1) cv`

`## [1] 3.873333`

`cv == cov(x,y) #equivalency of r code cov()`

`## [1] TRUE`

#### Correlation

\[Correlation = \frac{Cov(X,Y)}{\sigma_x\sigma_y} = \sqrt{R^2}\]

`cr <- cov(x,y)/(sd(x)*sd(y)) cr`

`## [1] 0.5013383`

`#round() b/c floating point issue in diff calculations round(cr,10) == round(sqrt(summary(lm(x~y))$r.squared),10) & #equivalency of alternative defintions round(cr,10) == round(cor(x,y),10) #equivalency of r code`

`## [1] TRUE`

# Calculating covariance matrix?

- Covariance matrix is calculated from a matrix where each column contains the values for a different variable and each row represents and individual
- The covariance matrix contains a row for each variable and a column for each variable
- The intersection of a row and a column represents to co-variance between the row variable and the column variable
- The diagnal represents the intersection of each variable with itself and therefore represents the
**variance** - Matrix operations allow covariance to be calculated for whole matrix rather than pairs of samples:
- Obtain vector of means by matrix operation: \[M = \frac{1}{N}\left(\begin{array}{c}1\\1\\\vdots\\1\end{array}\right)^{T}\left(\begin{array}{c}Y_{1}\\Y_{2}\\\vdots\\Y_{N}\end{array}\right) = \frac{1}{N}A^{T}Y\]
- \(Y_1\) is a vector of values corresponding to a particular variable (e.g. height, weight, etc.), each entry is that variable’s value for a particular individual (data point)
- Similarly for each \(Y_i\) through \(Y_N\)

- Calculate residual matrix \(R\) by subtracting the mean of each variable from each instance of that variable \[R = \left(\begin{array}{c}Y_{1}\\Y_{2}\\\vdots\\Y_{N}\end{array}\right) - M\]
- Calculate covariance matrix by crossproduct of \(R\) divided by N (see
*Examples of Matrix Algebra*for matrix calculation of variance) \[S = \frac{1}{n-1}R^TR\]

`z <- cbind(x,y) #creating the data matrix head(z)`

`## x y ## [1,] 65.04851 59.77827 ## [2,] 63.25094 63.21404 ## [3,] 64.95532 63.34242 ## [4,] 65.75250 62.79238 ## [5,] 61.13723 64.28113 ## [6,] 63.02254 64.24221`

`a <- matrix(rep(1, length(x))) data.frame("dimension" = c("rows", "columns"), "Y" = dim(z), "A" = dim(a))`

`## dimension Y A ## 1 rows 1078 1078 ## 2 columns 2 1`

`m <- (t(a) %*% z)/length(x) #calculating the mean matrix m`

`## x y ## [1,] 67.6871 68.68407`

`n <- matrix(rep(m,length(x)), nrow = length(x), byrow = T) #creating a matrix that repeats the mean matrix for the same number of rows as the data matrix r <- z - n head(r)`

`## x y ## [1,] -2.638587 -8.90580 ## [2,] -4.436157 -5.47003 ## [3,] -2.731777 -5.34165 ## [4,] -1.934597 -5.89169 ## [5,] -6.549867 -4.40294 ## [6,] -4.664557 -4.44186`

`v <- crossprod(r)/(length(x)-1) #calculating the covariance matrix v`

`## x y ## x 7.534303 3.873333 ## y 3.873333 7.922545`

`#round() b/c floating point issue in diff calculations #all checks if every cell in covariance matrix is equivalent all(round(v,10) == round(cov(cbind(x,y)),10)) #equivalency of r code cov() on a single matrix with variables x and y compared to above using cov() on seperate vectors x and Y`

`## [1] TRUE`

`all(list( v[1,1] == cov(x,x) & v[1,1] == var(x), v[2,2] == cov(y,y) & v[2,2] == var(y), v[1,2] == cov(x,y)))`

`## [1] TRUE`

- Obtain vector of means by matrix operation: \[M = \frac{1}{N}\left(\begin{array}{c}1\\1\\\vdots\\1\end{array}\right)^{T}\left(\begin{array}{c}Y_{1}\\Y_{2}\\\vdots\\Y_{N}\end{array}\right) = \frac{1}{N}A^{T}Y\]