TB2 - SLR#
2.1 - Introduction and Least Squares Estimates#
In this chapter, we consider problems modeling the relationship between two variables as a straight line. Note that a scatter plot should always be drawn before building a model to get an idea about the relationship between two variables.
2.1.1 - Simple Linear Regression Models#
The regression of \(Y\) on \(X\) is linear if and only if
where the unknown parameters \(\beta_0\) and \(\beta_1\) determine the intercept and slope of a specific straight line.
Suppose that \(Y_1, Y_2, \dots, Y_n\) are independent realizations of the random variable \(Y\) that are observed at the values \(x_1, x_2, \dots, x_n\) of a random variable \(X\). If the regression of \(Y\) on \(X\) is linear, then for \(i = 1, 2, \dots, n\)
where \(\epsilon_i\) is the random error in \(Y_i\) and is such that \(E(\epsilon | X) = 0\).
We will begin by assuming that
Estimating the Population Slope and Intercept#
Since the population intercept \(\beta_0\) and slope \(\beta_1\) are unknown, we must find \(b_0\) and \(b_1\) such that \(\hat y_i = b_0 + b_1 x_i\) is as close as possible to \(y_i\).
Residuals#
In practice, we want to minimize the difference between the actual value of \(y\) (\(y_i\)) and the predicted value of \(y\) (\(\hat y_i\)). This difference is called the residual \(\hat\epsilon_i\), that is
Least Squares Line of Best Fit#
A very popular method of choosing \(b_0\) and \(b_1\) is called the method of least squares. With this method, \(b_0\) and \(b_1\) are chosen to minimize the sum of squared residuals.
We can solve the following normal equations
to get the least squares estimates of the intercept and the slope:
Estimating the Variance of the Random Error Term#
Consider the linear regression model with constant variance:
where the random error \(\epsilon_i\) has mean \(0\) and variance \(\sigma^2\). We wish to examine \(\sigma^2 = \text{Var}(\epsilon)\). Notice that
The residuals can be used to estimate \(\sigma^2\). It can be shown that
is an unbiased estimate of \(\sigma^2\). Note:
\(\bar{\hat\epsilon} = 0\)
The divisor in \(S^2\) is \(n-2\) since we have estimated two parameters (\(\beta_0\) and \(\beta_1\))
2.2 - Inferences About the Slope and Intercept#
2.2.1 - Assumptions Necessary to Make Inferences About the Regression Model#
\(Y\) is related to \(x\) by the simple linear regression model.
The errors \(\epsilon_1, \dots, \epsilon_n\) are independent of each other.
The errors \(\epsilon_1, \dots, \epsilon_n\) have a common variance \(\sigma^2\).
The errors are normally distributed with a mean of 0 and variance \(\sigma^2\).
2.2.2 - Inferences About the Slope of the Regression Line#
To construct a hypothesis test for \(\beta_1\), we can use the following \(t\)-distribution:
Note that \(\text{se}(\hat\beta_1)\) is given directly by R.
2.2.3 - Inferences About the Intercept of the Regression Line#
To construct a hypothesis test and confidence interval for \(\beta_0\), we can use the following \(t\)-distribution:
2.3 - Confidence Intervals for the Population Regression Line#
The population regression line at \(X = x^*\) is given by
An estimator of this unknown quantity is the value of the estimated regression equation at \(X = x^*\), namely,
We can use the following \(t\)-distribution to perform hypothesis tests and create confidence intervals:
2.4 - Prediction Intervals for the Actual Values of \(Y\)#
The following \(t\)-distribution for a prediction interval for \(Y^*\) is given by
Note that the prediction interval is significantly wider than the confidence interval since it is meant to include specific values of single values \(Y^*\).
2.5 - Analysis of Variance (ANOVA)#
To test whether there is a linear relationship between \(Y\) and \(X\), we have to test \(H_0 : \beta_1 = 0\) against \(H_A : \beta_1 \ne 0\).
We can perform this test using the following \(t\)-statistic:
We can use the corrected sum of squares of the \(Y\)’s as a test statistic when there is more than one predictor variable (multiple regression):
The regression sum of squares is given by the following:
It can be shown that
To test \(H_0 : \beta_1 = 0\) against \(H_A : \beta_1 \ne 0\), we can use the test statistic