It’s 2025 and OSBDET 2025 Release 1 is out! ðŸ˜…

Happy New Year, future self! It looks like just yesterday when I wrote my last blog post, but the reality is that one year has passed by – it’s been a busy year indeed!

At this point, I can confirm that building a new course environment during my Christmas break has become a tradition 🙂. Spending time on personal projects alongside enjoying quality time with friends and family is a great way to unwind during the holidays.

Long story short, OSBDET 2025 Release 1 is now available for use by new intakes later this year. I’m finding that building new releases has become increasingly streamlined, allowing me to focus on refining the content and exploring new technologies… more on this in a bit ;).

For newcomers to the project, let me recap some key points from my previous posts. Here are the essential details:

  • OSBDET is a set of scripts designed to help you download and configure Big Data Open Source frameworks and tools in one single machine.
  • I use it to build a course environment (virtual machine) for my students in a matter of minutes.
  • It was motivated after spending many days building a course environment back in time (Building an analytics and multi data-set OVA for learners) and realizing that I’d have to spend the same amount of time every time I wanted to build a new one.
  • Not all my students have a strong technical background, therefore relying on complex commercial products is not an option; OSBDET rely on the same/similar technologies used by commercial products but in a very simplified manner which make it easier to study.

The OSBDET Github repo provides access to those scripts, which are primarily intended for advanced users. While they offer a convenient way to build course environments, I’ve also taken an additional step to simplify things further: building two Debian GNU/Linux virtual machines that come pre-configured with all the technologies used in my courses:

  • OSBDET’25r1.utm.zip (9.66GB) – a UTM compatible Virtual Machine ready to be used on Apple Silicon processor based laptops (Mac OS only); it’s been tested on UTM 4.6.4 on an Apple Silicon (M3) processor based Mac OS laptop running Sequoia.
  • OSBDET’25r1(amd64).ova (10.28GB) – a Virtual Box compatible Virtual Machine ready to be used on Intel processor based laptops (Windows and Mac OS mainly); it’s been tested on Virtual Box 7.1.4 on an Intel processor based Mac OS laptop running Monterrey and on a Windows 10 machine.

This release remains compatible with OSBDET recipes, although I use them primarily for personal experimentation and testing, rather than for student labs and projects. The versatility of the OSBDET concept has inspired me to leverage it as my go-to environment for day-to-day exploration and validation of new technologies.
Another notable improvement was the introduction of a user-friendly web interface in the previous release, which has greatly reduced the need for manual tool management. This has not only saved a significant amount of time during labs but also enabled us to focus on what matters most: bridging theory and practice. As in previous releases, the web interface can be accessed by navigating to http://localhost:2025 in your web browser.

Among the new elements you can expect in this release you’ll find:

  1. Existing frameworks have been updated to their latest stable versions, ensuring compatibility with the labs used in my courses (e.g., Kafka 3.9.0, NiFi 2.0.0, Spark 3.5.4).
  2. The update to NiFi 2.0.0 requires attention, as it introduces breaking changes that affect compatibility with previous versions. For instance, templates are no longer supported, and existing templates will need to be migrated to the new flow export/import mechanisms. For more information on these critical changes, refer to Pierre Villard’s article ‘Getting ready for Apache NiFi 2.0 ‘.
  3. Regarding frameworks in the course environment, Apache Airflow has been replaced by Kestra.io which seems to be lighter and easier to use. I’ve tried to use Airflow a couple of times but the default installation didn’t seem to work and I was unable to use it in the end; Kestra seems to be very promising and I’ll try to use it at some point.
    OpenMetadata has been added as well, and I’m planning to give it a try and use it in my Data Governance Master Classes moving forward. It’s a bit heavier than the rest of the frameworks shipped with the course environment, but I believe it’ll be worthwhile as it seems to introduce all the key elements to explore Data Governance, Data Quality and Data Observability.
  4. I’ve replaced Apache Airflow with Kestra.io in the course environment, finding its lightweight design and user-friendly interface more suitable for our needs. After struggling using Airflow with the default installations, I’m excited to explore Kestra further.
    OpenMetadata has also been added, which I plan to integrate into my Data Governance Master Classes moving forward. While it’s slightly larger than other frameworks in the environment, its comprehensive features for Data Governance, Data Quality, and Data Observability make me believe it will be a valuable addition.

As in previous releases, I’ve created a brief video walkthrough to guide you through installing and getting started with OSBDET 2025 Release 1:

Enjoy it my friend!

Raúl

Leave a comment