Standardized vs Unstandardized Regression Coefficients

Deepanshu Bhalla 11 Comments ,

This article explains the difference between standardized and unstandardized coefficients, with examples.

In one of my predictive model, I found a variable whose unstandardized regression coefficient (aka beta or estimate) close to zero (.0003) but it is statistically significant (p-value < .05). If a variable is significant, it means its coefficient value is significantly different from zero. The question arises "Why coefficient value is close to zero if it is a significant variable?". The answer lies in the difference between unstandardized coefficient and standardized coefficient.

If an independent variable is expressed in millions or billions of dollars (for eg, $656,765), it can have unstandardized estimate close to zero. To make the coefficient value more interpretable, we can rescale the variable by dividing the variable by 1000 or 100,000 (depending on the value). After rescaling the variable, run regression analysis again including the transformed variable. You would find beta coefficient larger than the old coefficient value and significantly larger than 0.

Important Key takeaway :
Unstandardized coefficient should not be used to drop or rank predictors (aka independent variables) as it does not eliminate the unit of measurement.

But if a standardized beta is close to zero, it's a REAL PROBLEM.

The concept of standardization or standardized coefficients comes into picture when predictors (aka independent variables) are expressed in different units. Suppose you have 3 independent variables - age, height and weight. The variable 'age' is expressed in years, height in cm, weight in kg. If we need to rank these predictors based on the unstandardized coefficient, it would not be a fair comparison as the unit of these variable is not same.


Practical Use of Standardized Coefficient

They are mainly used to rank predictors (or independent or explanatory variables) as it eliminate the units of measurement of  independent and dependent variables). We can rank independent variables with absolute value of standardized coefficients. The most important variable will have maximum absolute value of standardized coefficient.

Interpretation in Linear Regression

In the next section, we will discuss the interpretation of unstandardized and standardized coefficient in linear regression.

Linear Regression : Unstandardized Coefficient

It represents the amount by which dependent variable changes if we change independent variable by one unit keeping other independent variables constant.

Linear Regression : Standardized Coefficient

The standardized coefficient is measured in units of standard deviation. A beta value of 1.25 indicates that a change of one standard deviation in the independent variable results in a 1.25 standard deviations increase in the dependent variable.

Calculation of Standardized Coefficient for Linear Regression

Standardize both dependent and independent variables and use the standardized variables in the regression model to get standardized estimates. By 'standardize', i mean subtract the mean from each observation and divide that by the standard deviation. It is also called z-score. It would make mean 0 and standard deviation 1.


Another Approach
Standardized Coefficient for Linear Regression
Standardized Coefficient for Linear Regression
The standardized coefficient is found by multiplying the unstandardized coefficient by the ratio of the standard deviations of the independent variable and dependent variable.

Interpretation in Logistic Regression

Logistic Regression : Unstandardized Coefficient

If X increases by one unit, the log-odds of Y increases by k unit, given the other variables in the model are held constant. 

Logistic Regression : Standardized Coefficient

A standardized coefficient value of 2.5 explains one standard deviation increase in independent variable on average, a 2.5 standard deviation increase in the log odds of dependent variable.

Calculation of Standardized Coefficient for Logistic Regression

Standardized Coefficient for Logistic Regression
Standardized Coefficient for Logistic Regression

Calculate Standardized Coefficient for Linear Regression in R

Let's start building a linear regression model
In the program below, we are using Boston dataset. It's about housing values in suburbs of Boston.
library(MASS)
data(Boston)
str(Boston)
> str(Boston)
'data.frame': 506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
Data Description
crim – per capita crime rate by town.
zn – proportion of residential land zoned for lots over 25,000 sq. ft.
indus – proportion of non-retain business acres per town.
chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox – nitrogen oxides concentration (parts per million).
rm – average number of rooms per dwelling.
age – proportion of owner-occupied units built prior to 1940.
dis – weighted mean of distances to five Boston employment centers.
rad – index of accessibility to radial highways
tax – full-value property-tax rate per $10,000
ptratio – pupil-teacher ratio by town
black - 1000(Bk – 0.63)^2, where Bk is the proportion of blacks by town.
lstat – lower status of the population (percent).
medv – median value of owner-occupied homes in $1000s.

Standardized Coefficient using QuantPsyc Package
reg.model<-lm(medv ~ ., data=Boston)
#Standardised coefficients
library(QuantPsyc)
lm.beta(reg.model)
> lm.beta(reg.model)
        crim           zn        indus         chas          nox           rm 
-0.101017076  0.117715201  0.015335200  0.074198832 -0.223848028  0.291056465 
         age          dis          rad          tax      ptratio        black 
 0.002118638 -0.337836347  0.289749053 -0.226031680 -0.224271231  0.092432232 
       lstat 
-0.407446933

R Function : Standardized Coefficients in Linear Regression

We can compute standardized coefficient in R without using any package. See the function below-
stdz.coff <- function (regmodel)
{ b <- summary(regmodel)$coef[-1,1]
sx <- sapply(regmodel$model[-1], sd)
sy <- sapply(regmodel$model[1], sd)
beta <-b * sx / sy
return(beta)
}
stdz.coff(reg.model)

Standardized Coefficient for Logistic Regression in R
data("Titanic")
Y = data.frame(Titanic)["Survived"]
X = runif(32)
mydata= data.frame(X, Y)
#Logistic regression model
model <- glm(Survived~ X,family=binomial(link='logit'),data=mydata)
#R Function : Standardized Coefficients
stdz.coff <- function (regmodel)
{ b <- summary(regmodel)$coef[-1,1]
sx <- sapply(regmodel$model[-1], sd)
beta <-(3^(1/2))/pi * sx * b
return(beta)
}
#Standardized Estimate
stdz.coff(model)
#Unstandardized Estimate
model$coefficients[-1]

In SAS, you can include STB option to get standardized estimates.
proc logistic data = training descending;
class rank (ref ='1');
model admit = gre gpa rank /  stb;
run;
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 11 Responses to "Standardized vs Unstandardized Regression Coefficients"
  1. You give a formula for standardizing independent and dependent variables. Can't the R scale() function be used to do the same thing?

    ReplyDelete
  2. The higher the standardised coefficient the greater the significance?

    ReplyDelete
  3. Very nice post. It is useful to see the use in R. Thanks for the post.

    ReplyDelete
  4. Can you provide the derivation of the formula mentioned for calculating the standardized coefficient in logistic regression - 3^(1/2)/pi*... one? Any link would be of help too!

    ReplyDelete
  5. -8.243E-6 is which mean in regression?

    ReplyDelete
  6. In my dta, unstandardized Regression and Standardized coefficients have (large) differences in terms of statistical significance, why ?

    ReplyDelete
  7. what if the dependent variable is continuous and the independent variables contain a mix of categorical and continuous variables? How do you calculate the standardized coefficients?

    ReplyDelete
  8. May I know what does a negative standardized beta mean?

    ReplyDelete
  9. Hi how do you know if your regression results are already standardized?

    ReplyDelete
Next → ← Prev