A2B fly with me, Graph Processing at scale with Spark GraphFrames

Hey Raúl,

It’s been a while since the last time you stopped by… June 4th 2021 to be precise! A lot of things have happened since then and, among other things I should have posted about, this one deserves special attention as it is memorable and it deserves to be remembered and shared with the broader audience.

As part of the “Modern Data Architectures for Big Data II” course, my students have to work on a group assignment where they are able to make a good use of everything they’ve learned… I usually get superb material and I enjoyed loads going over the results; most of my students get impressed of what they’re capable of doing after so little time considering their non-technical background in the majority of the cases

Anyway, it happens that some of the projects are more than simple MVPs and they’re ready to move into something more serious… some of the them have been the seed of startups that might change the way we do things in the future, who knows. It’s not the case of the one I’m sharing today, but you never know.

Amishee Choksi, Jude Shammout, Grégoire Gratzmuller, Satoko Murata and Pedro Sugaya are the members of the group behind this excellent project; unfortunately, only Amishee, Jude and Grégoire managed to make the video as we all are having crazy agendas these days.

I’d like to thank them all for letting me share this material with the rest of the world; the students of the current intake, going through the same tough moments they went through some months back, will appreciate realize what they’ll be able to achieve in a matter of a few months 🙂 THANKS AGAIN!

About my guests

These are some words from Amishee:

“I completed my bachelor’s degree with a focus on business and accounting, post which I worked with a consulting company. While working on projects related to digital transformation, I realised the importance of integrating technology into everyday business, and why it was critical to use data to drive decision making. The dual degree of an International MBA and Masters in Big Data and Business Analytics at IE fit perfectly for me, to be able to hone my business skills, as well as learn the more technical aspects of applying and integrating technology with business.”

This is Jude introducing herself:

“I’m currently pursuing a Master’s in Finance and Master’s in Big Data and Business Analytics. Prior to joining IE, I founded a data driven market research and advertising agency where I saw firsthand the power of data to inform decisions. I chose these two degrees because I’m a firm believer that Big Data is, and will continue to be, a transformative force in finance and I’m excited to be part of this transformation.”

And last but not least, Grégoire:

After completing my studies in Business and Management I worked for a couple of years in Data Consulting, primarily on the project management side, orchestrating the team workflow and engaging with clients. My wish to be more involved in the technical side of data analysis fueled my motivation to pursue a Master’s degree in Business Analytics and Big Data at IE University.

What they did

As the title suggests, it’s been a graph processing with Spark exercise… but that’s not the only thing we study in this course; they managed to build an end to end solution with Big Data technologies:

  1. Data was manually ingested into HDFS, a distributed file system.
  2. This raw data has been processed and normalized to build the data structures needed for the GraphFrames Spark library.
  3. Many Graph processing algorithms have been used to identify business opportunities.
  4. Results have been stored in a relational database, MariaDB, for serving purposes.
  5. And those insights have been visualized in Superset, a powerful and open source visualization tool.

You might be wondering how the hell they’ve managed to make all these technologies work together… easy answer 🙂 they used OSBDET, the Open Source Big Data Educational Toolkit I started some years back… it runs all in one single machine, which means you cannot work with actual Big Data BUT, you can use Big Data frameworks and tools with Small Data.

The following video might be a better choice to see all those things in action:

Looking forward to coming back to this post in the future and enjoy of what I’m doing in present time 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s