This article explains the difference between standardized and unstandardized coefficients, with examples.
In one of my predictive model, I found a variable whose unstandardized regression coefficient (aka beta or estimate) close to zero (.0003) but it is statistically significant (p-value < .05). If a variable is significant, it means its coefficient value is significantly different from zero. The question arises "Why coefficient value is close to zero if it is a significant variable?". The answer lies in the difference between unstandardized coefficient and standardized coefficient.
If an independent variable is expressed in millions or billions of dollars (for eg, $656,765), it can have unstandardized estimate close to zero. To make the coefficient value more interpretable, we can rescale the variable by dividing the variable by 1000 or 100,000 (depending on the value). After rescaling the variable, run regression analysis again including the transformed variable. You would find beta coefficient larger than the old coefficient value and significantly larger than 0.
Unstandardized coefficient should not be used to drop or rank predictors (aka independent variables) as it does not eliminate the unit of measurement.
But if a standardized beta is close to zero, it's a REAL PROBLEM.
The concept of standardization or standardized coefficients comes into picture when predictors (aka independent variables) are expressed in different units. Suppose you have 3 independent variables - age, height and weight. The variable 'age' is expressed in years, height in cm, weight in kg. If we need to rank these predictors based on the unstandardized coefficient, it would not be a fair comparison as the unit of these variable is not same.
Practical Use of Standardized Coefficient
They are mainly used to rank predictors (or independent or explanatory variables) as it eliminate the units of measurement of independent and dependent variables). We can rank independent variables with absolute value of standardized coefficients. The most important variable will have maximum absolute value of standardized coefficient.
Interpretation in Linear Regression
In the next section, we will discuss the interpretation of unstandardized and standardized coefficient in linear regression.
Linear Regression : Unstandardized Coefficient
It represents the amount by which dependent variable changes if we change independent variable by one unit keeping other independent variables constant.
Linear Regression : Standardized Coefficient
The standardized coefficient is measured in units of standard deviation. A beta value of 1.25 indicates that a change of one standard deviation in the independent variable results in a 1.25 standard deviations increase in the dependent variable.
Calculation of Standardized Coefficient for Linear Regression
Standardize both dependent and independent variables and use the standardized variables in the regression model to get standardized estimates. By 'standardize', i mean subtract the mean from each observation and divide that by the standard deviation. It is also called z-score. It would make mean 0 and standard deviation 1.
Another Approach
Standardized Coefficient for Linear Regression |
The standardized coefficient is found by multiplying the unstandardized coefficient by the ratio of the standard deviations of the independent variable and dependent variable.
Interpretation in Logistic Regression
Logistic Regression : Unstandardized Coefficient
If X increases by one unit, the log-odds of Y increases by k unit, given the other variables in the model are held constant.
Logistic Regression : Standardized Coefficient
A standardized coefficient value of 2.5 explains one standard deviation increase in independent variable on average, a 2.5 standard deviation increase in the log odds of dependent variable.
Calculation of Standardized Coefficient for Logistic Regression
Standardized Coefficient for Logistic Regression |
Calculate Standardized Coefficient for Linear Regression in R
Let's start building a linear regression model
In the program below, we are using Boston dataset. It's about housing values in suburbs of Boston.
library(MASS) data(Boston) str(Boston)
> str(Boston) 'data.frame': 506 obs. of 14 variables: $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ... $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ... $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ... $ chas : int 0 0 0 0 0 0 0 0 0 0 ... $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ... $ rm : num 6.58 6.42 7.18 7 7.15 ... $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ... $ dis : num 4.09 4.97 4.97 6.06 6.06 ... $ rad : int 1 2 2 3 3 3 5 5 5 5 ... $ tax : num 296 242 242 222 222 222 311 311 311 311 ... $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ... $ black : num 397 397 393 395 397 ... $ lstat : num 4.98 9.14 4.03 2.94 5.33 ... $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
crim – per capita crime rate by town. zn – proportion of residential land zoned for lots over 25,000 sq. ft. indus – proportion of non-retain business acres per town. chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise). nox – nitrogen oxides concentration (parts per million). rm – average number of rooms per dwelling. age – proportion of owner-occupied units built prior to 1940. dis – weighted mean of distances to five Boston employment centers. rad – index of accessibility to radial highways tax – full-value property-tax rate per $10,000 ptratio – pupil-teacher ratio by town black - 1000(Bk – 0.63)^2, where Bk is the proportion of blacks by town. lstat – lower status of the population (percent). medv – median value of owner-occupied homes in $1000s.
Standardized Coefficient using QuantPsyc Package
reg.model<-lm(medv ~ ., data=Boston) #Standardised coefficients library(QuantPsyc) lm.beta(reg.model)
> lm.beta(reg.model) crim zn indus chas nox rm -0.101017076 0.117715201 0.015335200 0.074198832 -0.223848028 0.291056465 age dis rad tax ptratio black 0.002118638 -0.337836347 0.289749053 -0.226031680 -0.224271231 0.092432232 lstat -0.407446933
R Function : Standardized Coefficients in Linear Regression
We can compute standardized coefficient in R without using any package. See the function below-
stdz.coff <- function (regmodel)
{ b <- summary(regmodel)$coef[-1,1]
sx <- sapply(regmodel$model[-1], sd)
sy <- sapply(regmodel$model[1], sd)
beta <-b * sx / sy
return(beta)
}
stdz.coff(reg.model)
Standardized Coefficient for Logistic Regression in R
data("Titanic") Y = data.frame(Titanic)["Survived"] X = runif(32) mydata= data.frame(X, Y) #Logistic regression model model <- glm(Survived~ X,family=binomial(link='logit'),data=mydata) #R Function : Standardized Coefficients stdz.coff <- function (regmodel) { b <- summary(regmodel)$coef[-1,1] sx <- sapply(regmodel$model[-1], sd) beta <-(3^(1/2))/pi * sx * b return(beta) } #Standardized Estimate stdz.coff(model) #Unstandardized Estimate model$coefficients[-1]
In SAS, you can include STB option to get standardized estimates.
proc logistic data = training descending;
class rank (ref ='1');
model admit = gre gpa rank / stb;
run;
You give a formula for standardizing independent and dependent variables. Can't the R scale() function be used to do the same thing?
ReplyDeleteThe higher the standardised coefficient the greater the significance?
ReplyDeleteyes
DeleteVery nice post. It is useful to see the use in R. Thanks for the post.
ReplyDeleteCan you provide the derivation of the formula mentioned for calculating the standardized coefficient in logistic regression - 3^(1/2)/pi*... one? Any link would be of help too!
ReplyDelete-8.243E-6 is which mean in regression?
ReplyDeletePlease reply me !
DeleteIn my dta, unstandardized Regression and Standardized coefficients have (large) differences in terms of statistical significance, why ?
ReplyDeletewhat if the dependent variable is continuous and the independent variables contain a mix of categorical and continuous variables? How do you calculate the standardized coefficients?
ReplyDeleteMay I know what does a negative standardized beta mean?
ReplyDeleteHi how do you know if your regression results are already standardized?
ReplyDelete