How am I using basic data work to ensure I am getting a good price on my trip. It has been 4 years since my wife and I took some vacation in a sunny place. Last time, for our honeymoon, we spent some quality time in Mexico. We enjoyed 10 days in a very nice all-inclusive resort in Riviera Maya. Since then, a house, two kids, a new job and many other things. After some reflexion we decided that it was time to go back…Continue Reading “Using Data Science to save money on my next trip to Mexico”

My review of the latest Spark and AI Summit hosted in San Francisco on April 24th and 25th 2019. Last week was hosted the latest edition of the Spark Conference. It was the first time for me attending the conference. Here is a breakdown of the different aspect of the conference. The big news Databricks, organizer of the conference and the main contributor of Spark announced couple of items: Koalas They announced a new project called Koalas, a native “Pandas” interpreter for Spark. You can…Continue Reading “Spark & AI Summit 2019”

A more technical post about how I end up efficiently JOINING 2 datasets with REGEX using a custom UDF in SPARK Context For the past couple of months I have been struggling with this small problem. I have a list of REGEX patterns and I want to know which WIKIPEDIA article contains them. What I wanted to end with was a table with the following columns: Wikipedia Article ID Wikipedia Article Text Matching Pattern (or null if no pattern got triggered) ID Text Pattern 1…Continue Reading “Spark JOIN using REGEX”

Let’s not start with data science this time. Let’s start with psychology. I am far from having any competence in this domain, but I remember in high school being presented the Maslow’s hierarchy of needs. The best I can describe it is the different stage humans must go through to find happiness. To get better understanding of it, you can look here. Here is the famous pyramid. The key thing to get out of this is that you need to go through all steps, you…Continue Reading “The data science pyramid”