3.1. Matplotlib#

3.1.1. Twin axes in Matplotlib#

Did you know that you can have twin axes in Matplotlib? Adding a second XAxis or YAxis can take your time series plots to a whole new level👇

3.1.2. Scatterplot masterclass#

Matplotlib default styles suck. Especially, the look of the left one is revolting.

But, you can take some steps to transform it completely.

Here are they:

  1. Lower opacity - alpha

  2. Lower marker size - only accessible through the “plot” function

Here, you will see that values were rounded off at integer values so, the data is grouped into rows and columns. So, step 3 would be:

  1. Jittering the heights and weights.

  2. Zooming into the center of the plot.

Jittering is a nifty little trick to introduce some noise to the data and prevent overplotting. I learned this scatterplot example in one of DataCamp’s excellent courses.

3.1.3. Labelling the heights of bars in bar charts#

Most of the details on a bar chart is clutter.

Bar charts only need one of the axis labels, an informative title and the height of the bars. The rest goes to the bin. Here is a trick to label the heights of bars in Matplotlib. The rest should be easy👇

3.1.4. autofmt_xdate() to automatically format dates in Matplotlib#

Did it ever happen to you when you visualized a time series, the dates on the XAxis got smooshed together making them illegible? You can avoid that by calling the “autofmt_xdate()” function on the figure object to automatically format date labels in Matplotlib.

3.1.5. How to choose correct DPI and figure size in Matplotlib#

How to choose a correct DPI and figure size in Matplotlib so you don’t lose quality by zooming in?

Matplotlib sets figure size in inches - figsize of (12, 6) is 12 inches wide and 6 inches tall.

The DPI represents dots or pixels per inch. The default DPI of 100 means for a figsize of (12, 6), the image resolution will be 1200x600 pixels.

Now, there is also the size of the points, lines or other elements in a plot. Those are measured in points per inch - there are 72 points in an inch. So, in a DPI of 72, a single dot would have the area of a single pixel.

At 144 DPI, the dot would be two pixels or a line would be two pixels thick. So, DPI is like a magnifying glass - a higher DPI scales all elements in a plot.

So, to not lose image quality when zooming in, increase DPI while keeping the figsize constant.

Image and content credit: an SO thread down below👇

StackOverflow thread on the topic: https://bit.ly/3IrsLjY

3.1.6. Visualize all trees of RandomForest#

It would be freakishly cool to visualize all the trees in a Random Forest. But how?

Last time, I showed how you can draw a single Decision Tree using PyBaobabdt package using Sankey diagrams. To visualize multiple trees of a RandomForest, we can use Matplotlib subplots like below.

Just remember to set high DPI and high figure size before saving.

Image credit: Pybaobabdt docs. Code to create the plot is down below👇

Pybaobabdt docs: https://bit.ly/3unYtJc

Code to generate the plot: https://bit.ly/3yT9CUV

3.1.7. Venn diagrams in Python#

Drawing Venn diagrams in Matplotlib!

Matplotlib is built upon tiny moving classes called Artists. Everything is an artist in Matplotlib - each dot, circle, line, text, spine, etc. They all inherit from a base class called Artist.

If you use these Artists correctly you can draw practically everything in Matplotlib (even the Mandelbrot set). matplotlib_venn is a library that takes advantage of this feature and allows you to plot Venn diagrams.

Link to the library in the first comment.

The library: https://github.com/konstantint/matplotlib-venn

3.1.8. Anatomy of Matplotlib#

A plot that is worth a thousand plots.

Source: https://bit.ly/3P6gq6H