What is ssm in statistics?
Before reading it, though, make sure you are not mistaking regression for correlation. If you’ve got this checked, we can get straight into the action.
A quick side-note: Want to learn more about linear regression? Check out our explainer videos The Linear Regression Model. Geometrical Representation and The Simple Linear Regression Model.
There are three terms we must define. The sum of squares total, the sum of squares regression, and the sum of squares error.
The sum of squares total, denoted SST, is the squared differences between the observed dependent variable and its mean. You can think of this as the dispersion of the observed variables around the mean – much like the variance in descriptive statistics.
It is a measure of the total variability of the dataset.
Side note: There is another notation for the SST. It is TSS or total sum of squares.
The second term is the sum of squares due to regression, or SSR. It is the sum of the differences between the predicted value and the mean of the dependent variable. Think of it as a measure that describes how well our line fits the data.
If this value of SSR is equal to the sum of squares total, it means our regression model captures all the observed variability and is perfect. Once again, we have to mention that another common notation is ESS or explained sum of squares.
The last term is the sum of squares error, or SSE. The error is the difference between the observed value and the predicted value.
We usually want to minimize the error. The smaller the error, the better the estimation power of the regression. Finally, I should add that it is also known as RSS or residual sum of squares. Residual as in: remaining or unexplained.
It becomes really confusing because some people denote it as SSR. This makes it unclear whether we are talking about the sum of squares due to regression or sum of squared residuals.
In any case, neither of these are universally adopted, so the confusion remains and we’ll have to live with it.
Simply remember that the two notations are SST, SSR, SSE, or TSS, ESS, RSS.
There’s a conflict regarding the abbreviations, but not about the concept and its application. So, let’s focus on that.
Mathematically, SST = SSR + SSE.
The rationale is the following: the total variability of the data set is equal to the variability explained by the regression line plus the unexplained variability, known as error.
Given a constant total variability, a lower error will cause a better regression. Conversely, a higher error will cause a less powerful regression. And that’s what you must remember, no matter the notation.
Well, if you are not sure why we need all those sums of squares, we have just the right tool for you. The R-squared. Care to learn more? Just dive into the linked tutorial where you will understand how it measures the explanatory power of a linear regression!
The model sum of squares , or SSM, is a measure of the variation explained by our model. For each observation, this is the difference between the predicted value and the overall mean response. This is the variation that we attribute to the relationship between X and Y.