Portable Data Lake workshop: build Cloud Vendor-Free solutions

Join the community

Join the online Workshop

Portable Data Lake

Workshop

Live Friday 8 Nov 4 PM CET.

In this hands-on course, you'll learn how to create a basic yet functional portable data lake that sidesteps traditional cloud vendor locks.

With open-source technologies like Iceberg, Delta, and DuckDB at the forefront, we'll explore the power of portable data runtimes, embedded catalogs and cloud-agnostic compute solutions.

We’ll evaluate our alternatives and discuss existing industry limitations and why we chose the solution implemented.

We will then walk you through building a portable data lake from scratch, while understanding the trade-offs of using open-source tools in real-world scenarios.

Who is this workshop for?

Data Platform Engineers

Python-first Data Engineers

Data lake(house) & python users

And of course, anyone with an interest in data engineering and Python is welcome!

What's covered?

In this workshop, you'll get hands-on experience with a variety of powerful open-source tools that will empower you to build your data lake.

You'll learn about the current state of the industry and how to sidestep the current limitations.

We will compare our options building with Iceberg, Delta, or different stacks altogether.

Finally, we will choose a stack that's not currently vendor locked and build a functional portable data lake.

With dlt, parquet and DuckDB we will manage our data loading and storage.

Explore using Ibis as an embedded catalog and explore the benefits of this approach.

We explore how Polars fits in this stack to accelerate data exploration.

Finally, we will explore how to make this data accessible to other compute engines.

FAQs

What if I can't make one of the sessions?

If you cannot make one of the scheduled sessions, sign up anyway, you will get the recording!