Dwarves
Memo
This is an internal event demo workshop and showcase of DuckDB for evaluation and use for data science and analytical use-cases at our company.
Event DateDiscord ChannelType
23/11/2023🎙・labs-stageInternal

DuckDB has been an emerging OLAP database for analytical needs. Use cases for analytics usually use around 1GB-10GB of data, not enough to take advantage of the full potential of databases in data warehouses. The use case of analyzing data in small chunks is enough to pop up startups such as MotherDuck.

Agenda:

DuckDB is a great starting point for modern stacks that don’t need to worry about scalability, but want to derive hard analytics from the data that they have. This meeting will do a 3-part demo of DuckDB (15 minutes total): • Consolidate NYC taxi data in parallel (through map-reduce) over HTTPS • Aggregate IBM AML transactions (Anti-Money Laundering) • Some examples of how it is used to consolidate and do basic analytics on note.d and log.console.so In between, this meeting hopes to show some use-cases, possibilities, as well as costs and benefits of using DuckDB for introductory and advanced analytics.

Output:

• Evaluate DuckDB and evaluating whether it is appropriate for the projects we do now or for future projects moving forward • Have a format for demos for other new tech moving forward


Workshop Screenshots

This demo is from our pilot project, note.d.foundation, where we aggregate metadata from markdown notes imported from Notion into a parquet file to query data.

![[November Forward Engineering 2023-20231130164855761.webp]]

This demo is taken from Modal’s example of using DuckDB to analyze taxi NYC data in parallel over HTTPFS. It shows the power of DuckDB as a database engine and the flexibilities of having a local tool as opposed to a server-instantiated database.

![[November Forward Engineering 2023-20231130165019170.webp]]