← Home

The Evolution of Big Data Table Management

2025/07/02

This is just a summary of my understanding of how to organize bigdata in tables. So the level arrangement is rather random.

Level 1

No tables, just a bunch of files in a directory, read a single file or whole directory to process.

Level 1.5

Hive style partitioned/bucketed directories.

Level 2

Use Hive(or Glue...) metastore.

Level 2.5

Zorder.

Level 3

Table formats: Iceberg, Delta Lake, Hudi...

Level 3.5

Liquid clustering, auto clustering based on query usage, incremental.

Auto compaction on Tables.


References