My review of the latest Spark and AI Summit hosted in San Francisco on April 24th and 25th 2019. Last week was hosted the latest edition of the Spark Conference. It was the first time for me attending the conference. Here is a breakdown of the different aspect of the conference. The big news Databricks, organizer of the conference and the main contributor of Spark announced couple of items: Koalas They announced a new project called Koalas, a native “Pandas” interpreter for Spark. You can…Continue Reading “Spark & AI Summit 2019”

A more technical post about how I end up efficiently JOINING 2 datasets with REGEX using a custom UDF in SPARK Context For the past couple of months I have been struggling with this small problem. I have a list of REGEX patterns and I want to know which WIKIPEDIA article contains them. What I wanted to end with was a table with the following columns: Wikipedia Article ID Wikipedia Article Text Matching Pattern (or null if no pattern got triggered) ID Text Pattern 1…Continue Reading “Spark JOIN using REGEX”