My review of the latest Spark and AI Summit hosted in San Francisco on April 24th and 25th 2019. Last week was hosted the latest edition of the Spark Conference. It was the first time for me attending the conference. Here is a breakdown of the different aspect of the conference. The big news Databricks,…Continue Reading “Spark & AI Summit 2019”
A more technical post about how I end up efficiently JOINING 2 datasets with REGEX using a custom UDF in SPARK Context For the past couple of months I have been struggling with this small problem. I have a list of REGEX patterns and I want to know which WIKIPEDIA article contains them. What I…Continue Reading “Spark JOIN using REGEX”
Let’s not start with data science this time. Let’s start with psychology. I am far from having any competence in this domain, but I remember in high school being presented the Maslow’s hierarchy of needs. The best I can describe it is the different stage humans must go through to find happiness. To get better…Continue Reading “The data science pyramid”
Data Science is getting very popular and many people are trying to jump into the bandwagon, and this is GREAT. But many assume that data science, machine learning, plug any other buzzword here, is to plug data to some Sckit-Learn libraries. Here is what the actual job is.
To bring you into context, the following is happening after the data was collected. Don’t get me wrong, I don’t think it should be considered a simple step, but I would like to focus on data pre-processing and normalization.Continue Reading "This is what I really do as a Data Scientist"
A more efficient loss function for Siamese