Data Science is getting very popular and many people are trying to jump into the bandwagon, and this is GREAT. But many assume that data science, machine learning, plug any other buzzword here, is to plug data to some Sckit-Learn libraries. Here is what the actual job is.

To bring you into context, the following is happening after the data was collected. Don’t get me wrong, I don’t think it should be considered a simple step, but I would like to focus on data pre-processing and normalization.

Continue Reading "This is what I really do as a Data Scientist"

Lately, at work, we had to do a lot of unsupervised classification. We basically had to distinguish N classes from a sample population. We had a rough idea of how many classes were present but nothing was sure, we discovered the Kolmogorov–Smirnov test a very efficient way to determine if two samples are significantly different from each other. I will give you a bit of context on the Kolmogorov–Smirnov test and walk you though one problem we solved with it. A bit of theory Rejecting the null hypothesis. That sounds…Continue Reading “Kolmogorov–Smirnov test”