Use of standard error in linear models

• The central limit theorem applies to linear models
• Therefor $$\beta$$ will follow a normal distribution at large $$N$$ or a t-distribution at small, normally distributed $$N$$
• Obtaining standard error of $$\beta$$ allows for calculation of confidence intervals and p-values for linear models

Variance-covariance matrix

$\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}$

library(UsingR)
x <- father.son$fheight y <- father.son$sheight
X <- model.matrix(~x) #model.matrix() creates a matrix out of vectors and includes a leading "ones" column to be multiplied by the B0 coefficient (intercept)
N <- nrow(X)
p <- ncol(X)
b <- solve(crossprod(X))%*%crossprod(X,y)
resid <- y - X%*%b
s.squared <- as.numeric(crossprod(resid)/(N-p))
varcov = s.squared*solve(crossprod(X))
varcov
##             (Intercept)          x
## (Intercept)     3.35752 -0.0495222
## x              -0.04952  0.0007316
all(round(varcov,10) == round(vcov(lm(y~x)),10))
## [1] TRUE

Variance of a linear combination

• To find the variance of the linear combination of $$Y_1$$ and $$Y_2$$
• Use the equality of: $\mbox{var}(\mathbf{AY}) = \mathbf{A}\mbox{var}(\mathbf{Y}) \mathbf{A}^\top$
• Not derived here
• From Examples of Matrix Algebra, also know that linear combination: $Y_1 + Y_2 = \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix} \notag$
• So $\mbox{var}\{Y_1+Y_2\} = \mbox{var}\left\{ \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix}\right\} \notag$
• Thus from eq.$$(2)$$: $\mbox{var}\left\{ \begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\\ \end{pmatrix}\right\} =\begin{pmatrix}1&1\end{pmatrix} \mbox{var}(\mathbf{Y})\begin{pmatrix} 1\\1\\ \end{pmatrix} \notag$
• From eq.$$(1)$$: $\begin{pmatrix}1&1\end{pmatrix} \mbox{var}(\mathbf{Y})\begin{pmatrix} 1\\1\\ \end{pmatrix} =\begin{pmatrix}1&1\end{pmatrix} \sigma^2 \mathbf{I}\begin{pmatrix} 1\\1\\ \end{pmatrix}=2\sigma^2 \notag$
• Which finally simplifies to: $\mbox{var}\{Y_1+Y_2\} =2\sigma^2$

Standard error of least squares regression

• Recall that we determine the coefficients of the linear model by $$\beta = \mathbf{(X^\top X)^{-1}X^\top Y}$$
• Therefore $\mbox{var}(\beta) = \mbox{var}( \mathbf{(X^\top X)^{-1}X^\top Y} ) \notag$
• Through some matrix algebra rearrangement: $\mbox{var}(\beta) = \mbox{var}( \mathbf{(X^\top X)^{-1}X^\top Y} ) = \notag$ $\mathbf{(X^\top X)^{-1} X^\top} \mbox{var}(Y) (\mathbf{(X^\top X)^{-1} X^\top})^\top = \notag$ $\mathbf{(X^\top X)^{-1} X^\top} \sigma^2 \mathbf{I} (\mathbf{(X^\top X)^{-1} X^\top})^\top = \notag$ $\sigma^2 \mathbf{(X^\top X)^{-1} X^\top}\mathbf{X} \mathbf{(X^\top X)^{-1}} = \notag$ $\sigma^2\mathbf{(X^\top X)^{-1}} \notag$
• Thus, since standard error is the square root of variance: $\mbox{se}(\beta) = \sqrt{\sigma^2\mathbf{(X^\top X)^{-1}}}$

Estimating standard deviation

$\mathbf{r}\equiv\boldsymbol{\varepsilon} = \mathbf{Y}-\mathbf{X}\boldsymbol{\beta}$ $s^2 \equiv \hat{\sigma}^2 = \frac{1}{N-p}\mathbf{r}^\top\mathbf{r} = \frac{1}{N-p}\sum_{i=1}^N r_i^2$

• $$N$$ is the sample size
• $$p$$ is the number of columns in $$\mathbf{X}$$ (i.e. the number of coefficients in $$\beta$$)

library(UsingR)
x <- father.son$fheight y <- father.son$sheight
X <- model.matrix(~x) #model.matrix() creates a matrix out of vectors and includes a leading "ones" column to be multiplied by the B0 coefficient (intercept)
N <- nrow(X)
p <- ncol(X)
b <- solve(crossprod(X))%*%crossprod(X,y)
resid <- y - X%*%b
s.squared <- as.numeric(crossprod(resid)/(N-p))
se = sqrt(s.squared*solve(crossprod(X)))
diag(se)
## (Intercept)           x
##     1.83235     0.02705
summary(lm(y~x))$coefficients[,2] ## (Intercept) x ## 1.83235 0.02705 Linear combination of estimates • The standard error of a linear model’s coefficients is the diagonal of variance-covariance matrix square root sqrt(vcov(lm(y~x))) ## (Intercept) x ## (Intercept) 1.832 NaN ## x NaN 0.02705 data.frame(Std.Error = summary(lm(y~x))$coefficients[,2])
##             Std.Error
## (Intercept)   1.83235
## x             0.02705

Alternative way of calculating SE

• #Solve the for the coefficients of the linear model
X <- cbind(1,x) #First column of matrix is 1's because it will be multiplied by B0, the intercept
beta <- solve(t(X)%*%X) %*% t(X) %*% y
beta
##      [,1]
##   33.8866
## x  0.5141
#Calculate the predicted values based on the model
expected.linear <- beta[1] + beta[2]*x #Using a linear equation
expected.matrix <- t(beta) %*% t(X) #Using matrix algebra
all(expected.linear == expected.matrix) #Equivalency between the two methods
## [1] TRUE
#Calculate the residuals
resid <- y - expected.matrix
#Calculate the variance within the model (ANOVA strategy)
sum.sq.total <- sum((y-mean(y))^2) #Total sum of squares is the SSq of the response variable y
sum.sq.resid <- sum((resid-mean(resid))^2)
sum.sq.x <- sum.sq.total - sum.sq.resid
df.total <- length(x) - 1
df.x <- 1
df.resid <- df.total - df.x
mean.sq.x <- sum.sq.x / df.x
mean.sq.resid <- sum.sq.resid / df.resid
data.frame("variable" = c("x","Residual"),
"Df" = c(df.x, df.resid),
"SumSq" = c(sum.sq.x, sum.sq.resid),
"MeanSq" = c(mean.sq.x, mean.sq.resid))
##   variable   Df SumSq   MeanSq
## 1        x    1  2145 2144.580
## 2 Residual 1076  6388    5.937
anova(lm(y~x)) # could also do summary(aov(y~x))
## Analysis of Variance Table
##
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## x            1   2145    2145     361 <2e-16 ***
## Residuals 1076   6388       6
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Calculate the variance-covariance matrix
var.beta <- mean.sq.resid * solve(t(X) %*% X)
var.beta
##                     x
##    3.35752 -0.0495222
## x -0.04952  0.0007316
all(round(var.beta,10) == round(vcov(lm(y~x)),10)) #Equivalency of r code vcov()
## [1] TRUE
sqrt(var.beta) #As above, standard error is diagnol of variance-covariance square root
##               x
##   1.832     NaN
## x   NaN 0.02705
summary(lm(y~x))\$coefficients[,2]
## (Intercept)           x
##     1.83235     0.02705