In linear regression, box-cox transformation is widely used to transform target variable so that linearity and normality assumptions can be met. But box-cox transformation can be used only for strictly positive target values. If you have negative values in your target (dependent) variable, the box-cox and log transformation cannot be used.
2. Yeo-Johnson Power Transformations
R Code :
LN : Natural Log (base e)
With both negative and positive values, the transformation is a mixture of these
two, so different powers are used for positive and negative values. In this latter case,
interpretation of the transformation parameter is difficult, as it has a different meaning
for y<0 and y>=0.
3. Adjusted Log Transformation
Back Transformation : = exp(transformed value) -1+ min(Y)
How to handle negative data values
1. Cube Root (Power- 1/3)
Cube root can be used to transform negative, zero and positive data values. The best part about this transformation is it is very easy to perform 'back transformation' of this form to get back real values.
Back Transformation : Cube of the transformed value
It is an extension of Box cox transformation. It allows transformation of negative values.
require(car)It can be easily implemented manually. Look at the property shown below :
lambda.fm1 <- boxcox(mydata$y ~ mydata$x1 + mydata$x2), family="yjPower")
lambda.max <- lambda.fm1$x[which.max(lambda.fm1$y)]
mydata$y = yjPower(mydata$y, lambda=lambda.max, jacobian.adjusted=FALSE)
For Y < 0 ===> - log( -y + 1)
For Y >= 0 ===> log( y + 1)
Yeo-Johnson Power Transformation |
With both negative and positive values, the transformation is a mixture of these
two, so different powers are used for positive and negative values. In this latter case,
interpretation of the transformation parameter is difficult, as it has a different meaning
for y<0 and y>=0.
3. Adjusted Log Transformation
= log(1+Y-min(Y))Note : Both log to base e and log to base 10 can be used.
Back Transformation : = exp(transformed value) -1+ min(Y)
Isn't the cube root of a negative number another negative number too?
ReplyDeleteYeah, the point is to try and squish the numbers together in reversible ways to make them "look" as normal as possible. The center of that normal distribution can be wherever on the number line.
Delete