MINI LAKEHOUSE ON DUCKDB + LOOKER STUDIO: A DATA ENGINEERING PIPELINE
Saniya Shafi Ahmed Shaikh
Small organizations often rely on messy, inconsistent spreadsheets that limit analytics quality. This paper presents a Mini Lakehouse architecture built using DuckDB, Parquet, and Python, with Looker Studio dashboards. The workflow ingests Excel data, performs structured cleaning, builds a star schema, enforces data quality checks, and stores curated data in Parquet/DuckDB. This fully local, low cost pipeline provides reproducible, auditable analytics without cloud infrastructure. The result is a governed, BI ready system suitable for small teams and academic environments.

