Multicollinearity When two are more of the predictors are correlated, this phenomenon is called multicollinearity. This affects the resulting coefficients by masking the underlying individual weights of the correlated variables. This is why model weights are not equal to feature importance. Ways to deal with multicollinearity Looking at Variance Inflation Factor (VIf), which measures the … Continue reading Understanding Multicollinearity and Confounding Variables in Regression

# Category: Data Science

# Unnest (explode) a column of list in Pandas

In python, when you have a list of lists and convert it directly to a pandas dataframe, you get columns of lists. This may seem overwhelming, but fear not! Pandas comes to our rescue once again - use pandas.DataFrame.explode() import pandas as pd df = pd.DataFrame({'col1': [[0, 1, 2], 'foo', [], [3, 4]], 'col2': 1, … Continue reading Unnest (explode) a column of list in Pandas

# RStudio in Docker – now share your R code effortlessly!

If you are a full time data science practitioner and have passed through the stages of starting out with the Titanic dataset and working through the various exercises in Kaggle , you would know by now that we wish real world data problems are that simple, but they are not! This post is about just one … Continue reading RStudio in Docker – now share your R code effortlessly!

# 2 minute refresher to Logistic Regression

Here's a 2 minute refresher on Logistic regression for you: Logistic Regression is used to model the outcomes of a categorical target variable Input features are scaled just as with linear regression, however result is fed as an input to the logistic function. In linear regression, coefficients are found by minimizing the sum of squared … Continue reading 2 minute refresher to Logistic Regression