What does r square 0 546 mean

Reading and interpreting a regression table

In statistics, regression is a technique that can be used to analyze the relationship between predictor variables and a response variable.

If you do a regression analysis with software (like R, SAS, SPSS, etc.) the output is a regression table that summarizes the results of the regression. It is important to know how to read this table so that you can understand the results of regression analysis.

This tutorial walks you through an example of a regression analysis and provides a detailed explanation of how to read and interpret the output of a regression table.

A regression example

Suppose we have the following dataset showing the total number of hours studied, total number of preparatory exams taken, and final exam score for 12 different students:

To analyze the relationship between the hours studied and the prepared exams with a student's final exam score, we perform multiple linear regression, where hours studied and prep exams taken as predictor variables and that final exam score can be used as a response variable.

We get the following output:

Examine the performance of the model

The first section shows various numbers that measure the fit of the regression model.

To interpret each of the numbers in this section:

Multiple R

This is the correlation coefficient. It measures the strength of the linear relationship between the predictor variable and the response variable. A multiple R of 1 indicates a perfect linear relationship, while a multiple R of 0 indicates no linear relationship at all. The multiple R is the square root of the R square (see below).

In this example is multiple R 0.72855, suggesting a fairly strong linear relationship between the predictors hours studied and the prep exams and the response variable final exam score indicates.

R-square

This is often called the r2 written and is also available as Determination coefficient known. It is the fraction of the variance in the response variable that can be explained by the predictor variable.

The value for the R-square can range from 0 to 1. A value of 0 indicates that the predictor variable cannot explain the response at all. A value of 1 indicates that the predictor variable can perfectly explain the response.

In this example it is R-square 0.5307which indicates that 53.07% of the variance in the final exam results can be explained by the number of hours studied and the number of preparatory exams taken.

Related:What is a good R-square value?

This is a modified version of the R-square that has been adjusted to the number of predictors in the model. It's always lower than the R-square. The adjusted R-square can be useful for comparing the fit of different regression models.

In this example it is adjusted R-square 0.4265.

Standard error of regression

The standard error of the regression is the average distance that the observed values ​​fall from the regression line. In this example the observed values ​​fall off the regression line by an average of 7.3267 units.

Related:Understanding the standard error of regression

Observations

This is simply the number of observations in our data set. In this example the number of all observations is 12.

Testing the overall significance of the regression model

The next section shows the degrees of freedom, the sum of squares, the mean squares, the F-statistic, and the overall meaning of the regression model.

To interpret each of the numbers in this section:

Regression degrees of freedom

This number equals: the number of regression coefficients - 1. In this example we have one intercept term and two predictor variables, so we have three regression coefficients in total, which means that the degrees of freedom of regression are 3 - 1 = 2.

Total degrees of freedom

This number is equal to: the number of observations - 1. In this example we have 12 observations, so the total degrees of freedom are 12 - 1 = 11.

Residual degrees of freedom

This number is the same: total df - regression df. In this example, the remaining degrees of freedom are 11 – 2 = 9.

Middle squares

The mean regression squares are calculated by regression SS / regression df. In this example the Regression MS = 546.53308 / 2 = 273.2665.

The remaining middle squares are calculated by the remaining SS / remaining df. In this example the remaining MS = 483.1335 / 9 = 53.68151.

F statistics

The f-statistic is calculated as regression MS / dual travel MS. These statistics indicate whether the regression model fits the data better than a model that does not contain independent variables.

Essentially, it is testing whether the regression model is useful overall. In general, if none of the predictor variables in the model are statistically significant, then the overall F statistic is also not statistically significant.

In this example the F statistic is 273.2665 / 53.68151 = 5.09.

Meaning of F (P value)

The last value in the table is the p-value assigned to the F-statistic. To see if the overall regression model is significant, you can compare the p-value to a level of significance. Common options are .01, .05, and .10.

If the p-value is below the significance level, there is enough evidence that the regression model fits the data better than the model with no predictor variables. This finding is good because it means that the predictor variables in the model actually improve the fit of the model.

In this example the p-value is 0.033which is less than the usual level of significance of 0.05. This indicates that the overall regression model is statistically significant, that is, the model fits the data better than the model with no predictor variables.

Testing the overall significance of the regression model

The last section shows the coefficient estimates, the standard error of the estimates, the t-stat, p-values, and confidence intervals for each term in the regression model.

To interpret each of the numbers in this section:

Coefficients

The coefficients tell us the numbers needed to write the estimated regression equation:

y Has = b 0 + b 1 x 1 + b 2 x 2.

In this example, the estimated regression equation is:

final exam score = 66.99 + 1,299 (Study Hours) + 1,117 (Prep Exams)

Each individual coefficient is interpreted as the average increase in the response for each unit increase in a given predictor variable, assuming that all other predictor variables are held constant. For example, for each additional hour examined, the average expected increase in the score for the final exam is 1.299 points, provided that the number of prepared exams is kept constant.

The section is interpreted as the expected average final exam score for a student studying zero hours and not taking prep exams. In this example, a student is expected to score 66.99 if they study zero hours and do not take any prep exams. However, be careful when interpreting the intercept of a regression output, as it doesn't always make sense.

For example, in some cases the intercept may turn out to be a negative number that often has no obvious interpretation. This does not mean that the model is wrong, just that the section itself should not be interpreted as meaning anything.

Standard errors, t-stats, and p-values

The standard error is a measure of the uncertainty about the estimate of the coefficient for each variable.

The t-stat is simply the coefficient divided by the standard error. For example, the t-stat for hours studied 1,299 / 0,417 = 3,117.

The next column shows the p-value assigned to the t-stat. This number indicates whether a particular response variable is important in the model. In this example we see that the p-value for the Study hours 0.012 and the p-value for prep exams 0.304. This indicates that the hours studied are a significant predictor of final exam outcome that prep exams However not.

Confidence interval for coefficient estimates

The last two columns in the table contain the lower and upper bounds for a 95% confidence interval for the coefficient estimates.

For example, the coefficient estimate for the hours studied 1.299, but there is some uncertainty about this estimate. We can never know for sure if this is the exact coefficient. So a 95% confidence interval gives us a range of likely values ​​for the true coefficient.

In this case, the 95% confidence interval for the hours studied (0.356, 2.24). Note that this confidence interval does not contain the number “0”. This means that we are pretty sure that the true value for the coefficient of hours studied is not equal to zero, i.e. a positive number.

In contrast, the 95% confidence interval is for prep exams (-1,201, 3.436). Note that this confidence interval contains the number "0", which means that the true value for the coefficient of prep exams Could be zero, i.e. not significantly predictable in final exam scores.