This post includes detailed explanation of Relative Weight Analysis (RWA) along with its implementation in statistical softwares and programming like R, Python, SPSS and SAS.
RWA is quite popular in survey analytics world, mainly used to perform driver/impact analysis. For example which human resource driver makes employees stay or leave the organisation. Is 'pay' driver important than 'work-life balance'?. It is also called Relative Importance Analysis.
Relative Weight Analysis is a useful technique to calculate the relative importance of predictors (independent variables) when independent variables are correlated to each other. It is an alternative to multiple regression technique and it addresses multicollinearity problem and also helps to calculate the importance rank of variables. It helps to answer "Which variable is the most important and rank variables based on their contribution to R-Square".
When independent variables are correlated, it is difficult to determine the correct prediction power of each variable. Hence, it is difficult to rank them as we are unable to estimate coefficients correctly. Statistically, multicollinearity can increase the standard error of the coefficient estimates and make the estimates very sensitive to minor changes in the model. It means the coefficients are biased and difficult to interpret.
It creates a set of new independent variables that are the maximally related to the original independent variables but are uncorrelated to each other. Because these new transformed independent variables are uncorrelated to each other, the dependent variable can be regressed onto this new set of independent variables producing a series of standardized regression coefficients.
How to calculate Relative Weight Analysis?
Below are the steps to calculate Relative Weight Analysis (RWA)
- Compute correlation matrix between independent variables
- Calculate Eigenvector and Eigenvalues on the above correlation matrix
- Calculate diagonal matrix of eigenvalue and then take square root of the diagonal matrix
- Calculate matrix multiplication of eigenvector, matrix in step 3 and Transpose of Eigenvector
- Square the above matrix
- To calculate the partial effect of each independent variable on dependent variable, calculate matrix multiplication of [Inverse of matrix in step 4] and correlation matrix [between dependent and independent variables (i.e. 1 X 1 matrix)]
- To calculate R-Square, sum the above matrix (Step 6 matrix)
- To calculate raw relative weights, calculate matrix multiplication of [matrix in step 5] and [Square of matrix in step 6]
- To calculate raw relative weights as percentage of R-Square, divide raw relative weights by r-square and then multiply it by 100.
In the next section I have included programs to run RWA Analysis. Before running the analysis, it is important to ensure you don't have missing values in both independent and dependent variables. If you have missing values, it is important to impute or remove them. Also ensure you provide only numeric values in the target
and predictors
arguments in the programs below.
Calculate Relative Weight Analysis with Python, R, SAS and SPSS
SPSS Code
*Specify a path where INPUT DATA FILE is saved. FILE HANDLE Datafile /NAME='C:\Documents and Settings\Deepanshu\My Documents\Downloads\examples\RWA Data.sav'. *Specify a path where you wish OUTPUT files to be saved. FILE HANDLE Directory /NAME='C:\Documents and Settings\Deepanshu\My Documents\Downloads\examples'. *Define Independent Variable names. DEFINE Ivars ( ) var1 var2 var3 !ENDDEFINE . *Define Dependent Variable name. DEFINE Target ( ) Churn !ENDDEFINE . *Define VARIABLE LABELING for Independent variables. *Order of variable labeling and independent variables must be same. *Space in labels should not be used, rather words separated by"_". DEFINE LABELING ( ) Interest_Rate Renewed_PCT Account_No !ENDDEFINE. GET FILE = 'DataFile'. CORRELATIONS /VARIABLES= Target Ivars /MISSING=PAIRWISE /MATRIX = OUT (Corr.sav). oms /select tables /if commands =['Regression'] SUBTYPES=['Coefficients'] /destination format =SAV outfile ='Betas.sav' /columns sequence =[RALL CALL LALL]. regression /dependent Target /method= enter Ivars. omsend. GET FILE='Betas.sav'. FLIP ALL. COMPUTE var6=INDEX(CASE_LBL, '_Beta'). RECODE var6 (1 THRU HIGHEST=1) (else=0). SELECT IF var6=1. EXECUTE . DELETE VARIABLES CASE_LBL VAR6. SAVE Outfile = 'Directory\Coefficients.sav' / RENAME = (var001=Coefficients). GET FILE = corr.sav . SELECT IF rowtype_ = 'CORR' . EXECUTE. matrix. MGET / FILE = 'Corr.sav' / TYPE = CORR. COMPUTE R = CR. COMPUTE N = NCOL(R). COMPUTE RXX = R(2:N,2:N). COMPUTE RXY = R(2:N,1). CALL EIGEN(RXX,EVEC,EV). COMPUTE D = MDIAG(EV). COMPUTE DELTA = SQRT(D). COMPUTE LAMBDA = EVEC * DELTA * T(EVEC). COMPUTE LAMBDASQ = LAMBDA &**2. COMPUTE BETA1 = INV(LAMBDA) * RXY. COMPUTE RSQUARE = CSSQ(BETA1). COMPUTE RAWWGT = LAMBDASQ * BETA1 &**2. COMPUTE IMPORT = (RAWWGT &/ RSQUARE) * 100. PRINT RSQUARE /FORMAT=F8.8. PRINT RAWWGT /FORMAT=F8.8 /TITLE = "Raw Relative Weights" . PRINT IMPORT /FORMAT=PCT8.8 /TITLE = "Relative Weights as Percentage of R-square" . SAVE RSQUARE /OUTFILE='RSQ.sav'. SAVE RAWWGT /OUTFILE='Raw.sav'. SAVE IMPORT /OUTFILE='Relative.sav'. END MATRIX. INPUT PROGRAM. NUMERIC LABELING (F25). LOOP #=1 TO 1. END CASE. END LOOP. END FILE. END INPUT PROGRAM. FLIP. SAVE OUTFILE = 'Labeling.sav' / DROP VAR001 / RENAME (CASE_LBL=Categories). MATCH FILES FILE ='Labeling.sav' / FILE = 'Raw.sav' / Rename = (COL1 = RAW_RELATIVE) / FILE = 'Relative.sav' / Rename = (COL1 = PERCENT_RSQUARE) / FILE = 'Directory\Coefficients.sav' / FILE = 'RSQ.sav' / Rename = (COL1 = RSQUARE). FORMATS RAW_RELATIVE TO RSQUARE (F8.6). SAVE TRANSLATE OUTFILE='Directory\Final_Output.xls' /TYPE=XLS /VERSION=8 /REPLACE /FIELDNAMES /CELLS=VALUES. EXECUTE.
R Code
rwa <- function(df, target, predictors) { df2 <- df %>% select(all_of(c(target,predictors))) %>% filter(!is.na(target)) if(sum(colSums(is.na(df2)) > 0)) stop("Treat missing values in predictors") corX <- df2 %>% cor(., use = "pairwise.complete.obs") %>% .[2:ncol(.), 2:ncol(.)] corY <- df2 %>% cor(., use = "pairwise.complete.obs") %>% .[2:ncol(.), 1] eigenX <- eigen(corX) D <- diag(eigenX$values) delta <- sqrt(D) lambda <- eigenX$vectors %*% delta %*% t(eigenX$vectors) lambdasq <- lambda ^ 2 beta <- solve(lambda) %*% corY rsquare <- sum(beta ^ 2) rawWgt <- lambdasq %*% beta ^ 2 importance <- (rawWgt / rsquare) * 100 tbl <- data.frame(predictors, `Raw Relative Weights` = rawWgt, Importance = importance, Beta = beta) %>% arrange(desc(Importance)) return(list(`Importance Scores` = tbl, Rsquare = rsquare)) } library(dplyr) mtcars <- read.csv("https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/mtcars.csv") rwa(df = mtcars, target = "mpg", predictors = c("cyl", "disp", "hp", "gear"))
SAS Code
FILENAME PROBLY TEMP; PROC HTTP URL="https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/mtcars.csv" METHOD="GET" OUT=PROBLY; RUN; OPTIONS VALIDVARNAME=ANY; PROC IMPORT FILE=PROBLY OUT=WORK.MYDATA REPLACE DBMS=CSV; RUN; %macro rwa (df =, target=, predictors=, output=); data temp; set &df (keep = &target &predictors); where not missing(&target); run; proc corr data= temp out=corX(where=(_type_ = 'CORR')) NOMISS noprint; var &predictors; run; proc corr data= temp out=corY(where=(_type_ = 'CORR' and _name_ = "&target") drop=&target) NOMISS noprint; var &target &predictors; run; proc iml; use corX; read all var _NUM_ into M; close; eigenVal=eigval(m); eigenVec=eigvec(m); D = diag(eigenVal); delta = sqrt(D); lambda = eigenVec * delta * t(eigenVec); lambdasq = lambda ## 2; use corY; read all var _NUM_ into M2; close; beta = inv(lambda) * t(m2); rsquare = sum(beta ## 2); rawRelWgt = lambdasq * beta##2; importance = (rawRelWgt / rsquare) * 100; VarName = {&predictors}; print rsquare; create &output var {VarName rawRelWgt importance beta} ; append; close &output; quit; %mend; %rwa(df=MYDATA, target= mpg, predictors=cyl disp hp gear, output=importanceTbl);
Python Code
import pandas as pd import numpy as np import scipy.linalg def rwa(df, target, predictors): # Combine target and predictors allVars = predictors.copy() allVars.insert(0,target) # Non-missing values in target variables df2 = df.loc[:,allVars][df[target].notnull()] corX = df2.corr().loc[predictors,predictors] corY = df2.corr().loc[predictors,target] w,v= scipy.linalg.eig(corX) D = np.diag(w) delta = np.sqrt(D) l = np.matmul(np.matmul(v, delta), np.transpose(v)) lambdasq = l**2 beta = np.matmul(np.linalg.inv(l),corY) rsquare = sum(np.power(beta,2)) rawWgt = np.matmul(lambdasq, beta**2) importance = (rawWgt / rsquare) * 100 importanceTbl = pd.DataFrame({'Variables' : predictors, 'RawRelativeWeights' : rawWgt, 'ImportanceScores': importance, 'Beta':beta}) return importanceTbl, rsquare mtcars = pd.read_csv("https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/mtcars.csv") result, rsq = rwa(df = mtcars, target = "mpg", predictors = ["cyl", "disp", "hp", "gear"]) print(result) print(rsq)
$`RWA Table` predictors Raw.Relative.Weights Importance Beta 1 hp 0.2321744 29.79691 -0.4795836 2 cyl 0.2284797 29.32274 -0.4904939 3 disp 0.2221469 28.50999 -0.4835607 4 gear 0.0963886 12.37037 0.2734483 $Rsquare [1] 0.7791896
Signs of Beta can be interpreted as if predictor variable is positively or negatively impacting target variable. Negative sign denotes negative relationhip, positive sign denotes positive relationship.
Thanks for this interesting post.
ReplyDeleteHowever I'm experiencing some difficulties in following some steps, particularly steps 4 and 6
Could you please post an example of processing, say a 3 x 3 correlation matrix ?
Thanks in advance
This is excellent!
ReplyDeleteKlaZe it 13roSki
ReplyDeleteThanks a lot. This is very helpful. May I know how to revise the syntax if the raw data is a correlation matrix?
ReplyDeleteThank you for the code. Worked really well. Do you have a reference on interpretation of output?
ReplyDeleteWhere did you run it? i used spss 20 and produced errors
DeleteThanks for the above, I tried to use the syntax on spss 20 but produces error. Any advice?
ReplyDeleteHow I can calculate RII for multiple variables by using SPSS means for clients, contractor consultant and other factors and there are many questions under above listed parties
ReplyDelete