Distribution’s 5th Symphony

Categories Data Science

I would like to share with you a small tool I have discovered this year which is very useful; Violin Plot!

The Intro,

Violin Plot

Violin plots are a nice tool that we can find in many different visualisation libraries. What do they do they show and compare distributions. It’s, in my own opinion a better way to compare distributions. Here is a quick example

Seaborn Violin plot (Source: seaborn.pydata.org)

As you can see here you can easily compare the value from the for categories (Thur, Fri, Sat, Sun).

In a more precise way, the Violin plot is a mix of Box Plot and Kernel density plot

The Set list

  1. The thin bar represent the 95% confidence interval
  2. The thick bar represent the second and third Quartile
  3. The white dot represent the Median (The value in the middle)
  4. Top and Bottom point represent your min and max
  5. The coloured area represent the density or the distribution of the population

 

 

The Masterpiece

I have used them for hyper-parameter tuning. When you do hyper-parameter tuning using grid search fromĀ scikit-learn you get a array of result. I you look simply at the absolute minimum, you can easily end up with an anomaly or a corner case. But looking at the tendance you can get a way better result.

  1. Value will tend to be better in general
  2. The sparse between good and bad will be smaller.

 

In the above picture it is clear that if you pick a value above 3, you end up with highly variable values. Even if you potentially can get a better result with one of the values > 3, you get no stability and the % chances you get something bad is big.

In this case the value 3 is very good because. First it is a better score than 1 and 2, but it’s also a lot more compact, thus you can expect a small variance in your results.

 

Where to buy tickets

You can get violin plot in most of the popular visualization libraries. Here is a short list:

  • Seaborn: The one I have used in this article.
  • Plotly: Another very popular lib.
  • Matplotib: I do not like this one, personal taste.
  • GGplot2: For R lovers šŸ˜‰

The critics

In conclusion, this is a very nice tool when you need to compare distribution. I honestly think this tool is undervalued and underused.

Happy plotting,

1 thought on “Distribution’s 5th Symphony

  1. You really make it seem so easy with your presentation but I find this matter to be actually something which I think I would never understand. It seems too complicated and very broad for me. I’m looking forward for your next post, I will try to get the hang of it!

Leave a Reply

Your email address will not be published. Required fields are marked *