How am I using basic data work to ensure I am getting a good price on my trip. It has been 4 years since my wife and I took some vacation in a sunny place. Last time, for our honeymoon, we spent some quality time in Mexico. We enjoyed 10 days in a very nice all-inclusive resort in Riviera Maya. Since then, a house, two kids, a new job and many other things. After some reflexion we decided that it was time to go back…Continue Reading “Using Data Science to save money on my next trip to Mexico”
A more technical post about how I end up efficiently JOINING 2 datasets with REGEX using a custom UDF in SPARK Context For the past couple of months I have been struggling with this small problem. I have a list of REGEX patterns and I want to know which WIKIPEDIA article contains them. What I wanted to end with was a table with the following columns: Wikipedia Article ID Wikipedia Article Text Matching Pattern (or null if no pattern got triggered) ID Text Pattern 1…Continue Reading “Spark JOIN using REGEX”
Data Science is getting very popular and many people are trying to jump into the bandwagon, and this is GREAT. But many assume that data science, machine learning, plug any other buzzword here, is to plug data to some Sckit-Learn libraries. Here is what the actual job is.
To bring you into context, the following is happening after the data was collected. Don’t get me wrong, I don’t think it should be considered a simple step, but I would like to focus on data pre-processing and normalization.