Pandas - neat tricks
Contents
2.2. Pandas - neat tricks#
2.2.1. Setting Plotly as Pandas plotting backend#
How to make Pandas shake hands with Plotly and deeply insult Matplotlib.
Or in other words, let’s break up the dream duo:
2.2.2. Styling Pandas DataFrames#
Did you know you could make your Pandas DataFrames much fancier with a bit of styling?
As Jupyter has deep roots in HTML and CSS, you have full control over the look of df outputs.
It is one of those Pandas functions that amazes you at first, then you try it a couple times and forget that it exists after a while.
2.2.3. Quickly choose columns based on data type with select_dtypes#
Have you ever used for loops to filter Pandas columns based on data type? Well, those days are over.
From now on, use select_dtypes method of DataFrames👇
2.2.4. Set numeric display precision in Pandas#
It is very annoying when Pandas shows long floats in scientific notation. I usually struggle with approximating close-to-zero floats.
To prevent this, you can change the display option of Pandas to limit the floating point precision👇
2.2.5. Pandas explode#
What do you do if a dataframe cell contains a list of values? Well, you explode💣 them!
Pandas’ explode function takes a column and expands it vertically so that any cells that contain more than one value is stretched across multiple rows.
2.2.6. Pandas pipe#
Pandas has a similar “pipeline” feature like in Sklearn. By chaining multiple “pipe” functions together, you can call multiple preprocessing functions in a single line of code. Makes your code much more readable and easier to debug.
2.2.7. Encoding categorical features with pd.factorize
#
You don’t need to import Sklearn to encode categorical features if you are just data cleaning. Pandas will take care of you, as always!
Using the “factorize” function, you can encode orindal categorical features (categories with orderding) into numeric and get a numeric array as well as the unique values in a series.
Missing values gets encoded as -1 and they won’t be considered a new category. However, don’t use this function after you’ve split the data into training and test sets. The encoding of categories happens on a “meet-first” basis, so the same category can be assigned a different number from the training set depending on where it first appears.