Standard error in linear models - Hello. I'm Ben Solomon.

Use of standard error in linear models

The central limit theorem applies to linear models
Therefor \(\beta\) will follow a normal distribution at large \(N\) or a t-distribution at small, normally distributed \(N\)
Obtaining standard error of \(\beta\) allows for calculation of confidence intervals and p-values for linear models

Variance-covariance matrix

\[\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}\]

library(UsingR)
x <- father.son$fheight
y <- father.son$sheight
X <- model.matrix(~x) #model.matrix() creates a matrix out of vectors and includes a leading "ones" column to be multiplied by the B0 coefficient (intercept)
N <- nrow(X)
p <- ncol(X)
b <- solve(crossprod(X))%*%crossprod(X,y)
resid <- y - X%*%b
s.squared <- as.numeric(crossprod(resid)/(N-p))
varcov = s.squared*solve(crossprod(X))
varcov

##             (Intercept)          x
## (Intercept)     3.35752 -0.0495222
## x              -0.04952  0.0007316

all(round(varcov,10) == round(vcov(lm(y~x)),10))

## [1] TRUE

Variance of a linear combination

To find the variance of the linear combination of \(Y_1\) and \(Y_2\)
Use the equality of: \[\mbox{var}(\mathbf{AY}) = \mathbf{A}\mbox{var}(\mathbf{Y}) \mathbf{A}^\top\]
- Not derived here
From Examples of Matrix Algebra, also know that linear combination: \[Y_1 + Y_2 = \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix} \notag\]
So \[\mbox{var}\{Y_1+Y_2\} = \mbox{var}\left\{ \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix}\right\} \notag\]
Thus from eq.\((2)\): \[\mbox{var}\left\{ \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix}\right\} =\begin{pmatrix}1&1\end{pmatrix} \mbox{var}(\mathbf{Y})\begin{pmatrix} 1\\1\\ \end{pmatrix} \notag\]
From eq.\((1)\): \[\begin{pmatrix}1&1\end{pmatrix} \mbox{var}(\mathbf{Y})\begin{pmatrix} 1\\1\\ \end{pmatrix} =\begin{pmatrix}1&1\end{pmatrix} \sigma^2 \mathbf{I}\begin{pmatrix} 1\\1\\ \end{pmatrix}=2\sigma^2 \notag\]
Which finally simplifies to: \[\mbox{var}\{Y_1+Y_2\} =2\sigma^2\]

Standard error of least squares regression

Recall that we determine the coefficients of the linear model by \(\beta = \mathbf{(X^\top X)^{-1}X^\top Y}\)
Therefore \[\mbox{var}(\beta) = \mbox{var}( \mathbf{(X^\top X)^{-1}X^\top Y} ) \notag\]
Through some matrix algebra rearrangement: \[\mbox{var}(\beta) = \mbox{var}( \mathbf{(X^\top X)^{-1}X^\top Y} ) = \notag\] \[\mathbf{(X^\top X)^{-1} X^\top} \mbox{var}(Y) (\mathbf{(X^\top X)^{-1} X^\top})^\top = \notag\] \[\mathbf{(X^\top X)^{-1} X^\top} \sigma^2 \mathbf{I} (\mathbf{(X^\top X)^{-1} X^\top})^\top = \notag\] \[\sigma^2 \mathbf{(X^\top X)^{-1} X^\top}\mathbf{X} \mathbf{(X^\top X)^{-1}} = \notag\] \[\sigma^2\mathbf{(X^\top X)^{-1}} \notag\]
Thus, since standard error is the square root of variance: \[\mbox{se}(\beta) = \sqrt{\sigma^2\mathbf{(X^\top X)^{-1}}}\]

Estimating standard deviation

\[\mathbf{r}\equiv\boldsymbol{\varepsilon} = \mathbf{Y}-\mathbf{X}\boldsymbol{\beta}\] \[s^2 \equiv \hat{\sigma}^2 = \frac{1}{N-p}\mathbf{r}^\top\mathbf{r} = \frac{1}{N-p}\sum_{i=1}^N r_i^2\]

\(N\) is the sample size

\(p\) is the number of columns in \(\mathbf{X}\) (i.e. the number of coefficients in \(\beta\))

library(UsingR)
x <- father.son$fheight
y <- father.son$sheight
X <- model.matrix(~x) #model.matrix() creates a matrix out of vectors and includes a leading "ones" column to be multiplied by the B0 coefficient (intercept)
N <- nrow(X)
p <- ncol(X)
b <- solve(crossprod(X))%*%crossprod(X,y)
resid <- y - X%*%b
s.squared <- as.numeric(crossprod(resid)/(N-p))
se = sqrt(s.squared*solve(crossprod(X)))
diag(se)

## (Intercept)           x
##     1.83235     0.02705

summary(lm(y~x))$coefficients[,2]

## (Intercept)           x
##     1.83235     0.02705

Linear combination of estimates

The standard error of a linear model’s coefficients is the diagonal of variance-covariance matrix square root

sqrt(vcov(lm(y~x)))

##             (Intercept)       x
## (Intercept)       1.832     NaN
## x                   NaN 0.02705

data.frame(Std.Error = summary(lm(y~x))$coefficients[,2])

##             Std.Error
## (Intercept)   1.83235
## x             0.02705

Alternative way of calculating SE

#Solve the for the coefficients of the linear model
X <- cbind(1,x) #First column of matrix is 1's because it will be multiplied by B0, the intercept
beta <- solve(t(X)%*%X) %*% t(X) %*% y
beta

##      [,1]
##   33.8866
## x  0.5141

#Calculate the predicted values based on the model
expected.linear <- beta[1] + beta[2]*x #Using a linear equation
expected.matrix <- t(beta) %*% t(X) #Using matrix algebra
all(expected.linear == expected.matrix) #Equivalency between the two methods

## [1] TRUE

#Calculate the residuals
resid <- y - expected.matrix

#Calculate the variance within the model (ANOVA strategy)
sum.sq.total <- sum((y-mean(y))^2) #Total sum of squares is the SSq of the response variable y
sum.sq.resid <- sum((resid-mean(resid))^2)
sum.sq.x <- sum.sq.total - sum.sq.resid
df.total <- length(x) - 1
df.x <- 1
df.resid <- df.total - df.x
mean.sq.x <- sum.sq.x / df.x
mean.sq.resid <- sum.sq.resid / df.resid
data.frame("variable" = c("x","Residual"),
           "Df" = c(df.x, df.resid),
           "SumSq" = c(sum.sq.x, sum.sq.resid),
           "MeanSq" = c(mean.sq.x, mean.sq.resid))

##   variable   Df SumSq   MeanSq
## 1        x    1  2145 2144.580
## 2 Residual 1076  6388    5.937

anova(lm(y~x)) # could also do summary(aov(y~x))

## Analysis of Variance Table
##
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## x            1   2145    2145     361 <2e-16 ***
## Residuals 1076   6388       6
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Calculate the variance-covariance matrix
var.beta <- mean.sq.resid * solve(t(X) %*% X)
var.beta

##                     x
##    3.35752 -0.0495222
## x -0.04952  0.0007316

all(round(var.beta,10) == round(vcov(lm(y~x)),10)) #Equivalency of r code vcov()

## [1] TRUE

sqrt(var.beta) #As above, standard error is diagnol of variance-covariance square root

##               x
##   1.832     NaN
## x   NaN 0.02705

summary(lm(y~x))$coefficients[,2]

## (Intercept)           x
##     1.83235     0.02705