Understanding Multicollinearity and Confounding Variables in Regression


When two are more of the predictors are correlated, this phenomenon is called multicollinearity. This affects the resulting coefficients by masking the underlying individual weights of the correlated variables. This is why model weights are not equal to feature importance.

Ways to deal with multicollinearity

  • Looking at Variance Inflation Factor (VIf), which measures the inflation of estimated coefficients when multicollinearity exists
from statsmodels.stats.outliers_influence import variance_inflation_factor

# the independent variables from dataframe
X = df[['col1', 'col2', 'col3']]

# VIF dataframe
vif_df = pd.DataFrame()

# calculating VIF for each feature
vif_df["vif"] = [variance_inflation_factor(X.values, i)
						for i in range(len(X.columns))]

  • Removing correlated variables
  • Using PCA

Confounding Varibales

This is an extreme case of multicollinearity, where a variable affects both the dependent and an independent variable. This can cause invalid correlations. For ex.

Higher consumption of ice cream -> Higher likelihood of sunburn

Here, the above conclusion seems incorrect, what could have affected both the variables is Higher temperatures leading to higher consumption of ice cream leading to higher likelihood of sunburn.

Common Reasons for confounding variables to occur

  • Selection bias – data biased due to the way it was collected, eg. class imbalance
  • Omitted variable bias – when important variables are omitted resulting in regression model that is biased and inconsistent

Ways to deal with confounding varibales

  • Stratification – balance the dataset in such ways that confounding variables do no vary much
  • Chi square test of independence – this determines whether there is a statistically significant relationship between two categorical variables

Image source: Unsplash

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s