Standard Error of Linear Models
Use of standard error in linear models
- The central limit theorem applies to linear models
- Therefor \(\beta\) will follow a normal distribution at large \(N\) or a t-distribution at small, normally distributed \(N\)
- Obtaining standard error of \(\beta\) allows for calculation of confidence intervals and p-values for linear models
Variance-covariance matrix
\[\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}\]
library(UsingR)
x <- father.son$fheight
y <- father.son$sheight
X <- model.matrix(~x) #model.matrix() creates a matrix out of vectors and includes a leading "ones" column to be multiplied by the B0 coefficient (intercept)
N <- nrow(X)
p <- ncol(X)
b <- solve(crossprod(X))%*%crossprod(X,y)
resid <- y - X%*%b
s.squared <- as.numeric(crossprod(resid)/(N-p))
varcov = s.squared*solve(crossprod(X))
varcov
## (Intercept) x
## (Intercept) 3.35752 -0.0495222
## x -0.04952 0.0007316
all(round(varcov,10) == round(vcov(lm(y~x)),10))
## [1] TRUE
Variance of a linear combination
- To find the variance of the linear combination of \(Y_1\) and \(Y_2\)
- Use the equality of: \[\mbox{var}(\mathbf{AY}) = \mathbf{A}\mbox{var}(\mathbf{Y}) \mathbf{A}^\top\]
- Not derived here
- From Examples of Matrix Algebra, also know that linear combination: \[Y_1 + Y_2 = \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix} \notag\]
- So \[\mbox{var}\{Y_1+Y_2\} = \mbox{var}\left\{ \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix}\right\} \notag\]
- Thus from eq.\((2)\): \[\mbox{var}\left\{ \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix}\right\} =\begin{pmatrix}1&1\end{pmatrix} \mbox{var}(\mathbf{Y})\begin{pmatrix} 1\\1\\ \end{pmatrix} \notag\]
- From eq.\((1)\): \[\begin{pmatrix}1&1\end{pmatrix} \mbox{var}(\mathbf{Y})\begin{pmatrix} 1\\1\\ \end{pmatrix} =\begin{pmatrix}1&1\end{pmatrix} \sigma^2 \mathbf{I}\begin{pmatrix} 1\\1\\ \end{pmatrix}=2\sigma^2 \notag\]
- Which finally simplifies to: \[\mbox{var}\{Y_1+Y_2\} =2\sigma^2\]
Standard error of least squares regression
- Recall that we determine the coefficients of the linear model by \(\beta = \mathbf{(X^\top X)^{-1}X^\top Y}\)
- Therefore \[\mbox{var}(\beta) = \mbox{var}( \mathbf{(X^\top X)^{-1}X^\top Y} ) \notag\]
- Through some matrix algebra rearrangement: \[\mbox{var}(\beta) = \mbox{var}( \mathbf{(X^\top X)^{-1}X^\top Y} ) = \notag\] \[\mathbf{(X^\top X)^{-1} X^\top} \mbox{var}(Y) (\mathbf{(X^\top X)^{-1} X^\top})^\top = \notag\] \[\mathbf{(X^\top X)^{-1} X^\top} \sigma^2 \mathbf{I} (\mathbf{(X^\top X)^{-1} X^\top})^\top = \notag\] \[\sigma^2 \mathbf{(X^\top X)^{-1} X^\top}\mathbf{X} \mathbf{(X^\top X)^{-1}} = \notag\] \[\sigma^2\mathbf{(X^\top X)^{-1}} \notag\]
- Thus, since standard error is the square root of variance: \[\mbox{se}(\beta) = \sqrt{\sigma^2\mathbf{(X^\top X)^{-1}}}\]
Estimating standard deviation
\[\mathbf{r}\equiv\boldsymbol{\varepsilon} = \mathbf{Y}-\mathbf{X}\boldsymbol{\beta}\] \[s^2 \equiv \hat{\sigma}^2 = \frac{1}{N-p}\mathbf{r}^\top\mathbf{r} = \frac{1}{N-p}\sum_{i=1}^N r_i^2\]
- \(N\) is the sample size
\(p\) is the number of columns in \(\mathbf{X}\) (i.e. the number of coefficients in \(\beta\))
library(UsingR) x <- father.son$fheight y <- father.son$sheight X <- model.matrix(~x) #model.matrix() creates a matrix out of vectors and includes a leading "ones" column to be multiplied by the B0 coefficient (intercept) N <- nrow(X) p <- ncol(X) b <- solve(crossprod(X))%*%crossprod(X,y) resid <- y - X%*%b s.squared <- as.numeric(crossprod(resid)/(N-p)) se = sqrt(s.squared*solve(crossprod(X))) diag(se)
## (Intercept) x ## 1.83235 0.02705
summary(lm(y~x))$coefficients[,2]
## (Intercept) x ## 1.83235 0.02705
Linear combination of estimates
The standard error of a linear model’s coefficients is the diagonal of variance-covariance matrix square root
sqrt(vcov(lm(y~x)))
## (Intercept) x ## (Intercept) 1.832 NaN ## x NaN 0.02705
data.frame(Std.Error = summary(lm(y~x))$coefficients[,2])
## Std.Error ## (Intercept) 1.83235 ## x 0.02705
Alternative way of calculating SE
#Solve the for the coefficients of the linear model X <- cbind(1,x) #First column of matrix is 1's because it will be multiplied by B0, the intercept beta <- solve(t(X)%*%X) %*% t(X) %*% y beta
## [,1] ## 33.8866 ## x 0.5141
#Calculate the predicted values based on the model expected.linear <- beta[1] + beta[2]*x #Using a linear equation expected.matrix <- t(beta) %*% t(X) #Using matrix algebra all(expected.linear == expected.matrix) #Equivalency between the two methods
## [1] TRUE
#Calculate the residuals resid <- y - expected.matrix
#Calculate the variance within the model (ANOVA strategy) sum.sq.total <- sum((y-mean(y))^2) #Total sum of squares is the SSq of the response variable y sum.sq.resid <- sum((resid-mean(resid))^2) sum.sq.x <- sum.sq.total - sum.sq.resid df.total <- length(x) - 1 df.x <- 1 df.resid <- df.total - df.x mean.sq.x <- sum.sq.x / df.x mean.sq.resid <- sum.sq.resid / df.resid data.frame("variable" = c("x","Residual"), "Df" = c(df.x, df.resid), "SumSq" = c(sum.sq.x, sum.sq.resid), "MeanSq" = c(mean.sq.x, mean.sq.resid))
## variable Df SumSq MeanSq ## 1 x 1 2145 2144.580 ## 2 Residual 1076 6388 5.937
anova(lm(y~x)) # could also do summary(aov(y~x))
## Analysis of Variance Table ## ## Response: y ## Df Sum Sq Mean Sq F value Pr(>F) ## x 1 2145 2145 361 <2e-16 *** ## Residuals 1076 6388 6 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Calculate the variance-covariance matrix var.beta <- mean.sq.resid * solve(t(X) %*% X) var.beta
## x ## 3.35752 -0.0495222 ## x -0.04952 0.0007316
all(round(var.beta,10) == round(vcov(lm(y~x)),10)) #Equivalency of r code vcov()
## [1] TRUE
sqrt(var.beta) #As above, standard error is diagnol of variance-covariance square root
## x ## 1.832 NaN ## x NaN 0.02705
summary(lm(y~x))$coefficients[,2]
## (Intercept) x ## 1.83235 0.02705