INVESTie, the crypto trading service leveraging Big Data technologies

Hello hello, myself

Two blog posts in one month, unbelievable! I managed to bring new content thanks to my very great students who will share a top notch group assignment they worked on recently.

Javier Nieves, Dareen Shaheen, Mohamad Rizk, Amanda Moles, Ian Zaqueu and Connie Kim, students of the part-time Master I’ve been delivering for a while now, accepted to share their group assignment for the Modern Data Architectures for Big Data II course they finished last November with the rest of the world. I feel super proud of what they managed to come up with and I hope this motivates future students to go the extra mile a build something as awesome as what they built.

A BIG THANK YOU for making this possible, it means a lot to me and I’m sure it’ll benefit a lot of people out there.

About my guests

Some words about Javier:

“My name is Javier Nieves, originally from Madrid, Spain. I’m a Chemical Engineer by education, both Bachelor and Master.
Then I went to the US and I went back. I’ve been working on the chemical manufacturing industry for 15 years, but a couple of years ago I decided to complement all that, the more traditional kind of a professional life with the data analytics.
My intention is to apply the analytics as much as I can in the industry that I know which is the chemical manufacturing industry, but also I love finance and investment, the other area that I really like”

This is Dareen in her own words:

“Hi! My name is Dareen, I’m originally from Egypt and based in Netherlands. I have a background in Computer Science and have been working in cyber security for almost +18 years. I’ve been doing this Master actually to be able to apply the data analytics on the cyber one day.”

Let’s see what Mohamad has to say about himself:

My name is Mohamad Rizk, I’m Lebanese, living and working in Qatar and Head of HR in a group of companies.
I’ve been in HR for more than 18 years now and I’m doing the Master to apply analytics in people, People Analytics, and to prepare myself for the future.

It’s now the turn of Amanda:

I’m Amanda, I’m Italian-Brazilian and I have a Bachelor’s and a Master’s degree in Industrial Engineering.
I started my career working with investment banking but quickly moved to data, specifically data migration for retail. For last 5 years I have been working as a data engineer in the tourism sector. I also got a Master’s in technologies applied to tourism, so I love data.

Some words from Ian:

My name is Ian Zaqueu, I’m from Mozambique raised predominantly in Europe most of my life and in a couple of different places.
My background is a bit different, I’m from an entrepreneurial background for almost 10 years now based in Mozambique currently working in investments.
We had a crowdfunding company, and now heading more into investment management. And so I think, doing this data analytics Master is mostly to apply it in various, I think, entrepreneurial avenues in the future.

Last but not least, Connie

“My name is Connie Kim, born and raised in California, from a classic American immigration story – my grandmother fled during the Korean war on a US cargo ship SS Meredith Victory. I am lucky to work in Silicon Valley today, leading a program management team for a consumer tech company that looks after localization quality in 50+ languages. We are constantly using data all the time to drive all of our business decisions, hence why I chose this Big Data program. I also wanted to be in a grad school program that was globally diverse – hence IE. Unfortunately, my connection failed during this recording, so I was not able to participate in this video, but I’ve enjoyed every aspect of learning these technologies in Raul’s class and working in this interesting project with these talented colleagues.”

What they did

Well, nothing fancy, just a trading solution dealing with historical and real-time data 🙂 I’m kidding of course, what they did is very exceptional and they went the extra mile one more time.

Before going into details, let me bring a BIG DISCLAIMER; what you’ll see here is just an academic exercise to put the technologies we learned in class into practice. By no means this is a trading advice or the way to go to invest in crypto assets 🙂

The following bullet points summarize the Big Data pipeline they’ve built for the assignment:

  1. Data has been automatically ingested by using NiFi and a bunch of scripts pulling data out of several REST APIs (Coin Metrics, Coinbase, Binance, Twitter, …)
  2. As the nature of this data is batch and real-time, they persisted this data in HDFS, batch storage, and Kafka, streaming storage, to enable different processing paradigms.
  3. They’ve leveraged the Spark’s Structured API and Spark’s MLlib to train some models with historical data (batch) available in HDFS.
  4. They’ve leveraged Spark Streaming to process real-time data.
  5. The insights from (3) and (4) are stored in MariaDB, the serving layer, for further visualization.
  6. Last but not least, real-time visualizations with Superset.

What is really surprising to me is that they managed to make it work in OSBDET, the course environment I use in my courses to be 100% focus on the learning and no wasting time on making the infrastructure underneath work.

And, as usual, here you have the complementary video we made going over the whole exercise:

Enjoy the past while living the present 😉


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s