Summary of Bigtable

2021/11/28

A bigtable is a sparse, distributed, persistent multi-dimensional sorted map.

Features

(Row, column, time) -> value.

Columns are divided into different families for access control and locality refinement (see below).

Time is used to mark different versions of value.

SSTable: immutable sorted string table. Use index to faster block locates
Chubby: provide namespace of directory and files, callback notification, session control, distributed locks. (Similar to Zookeeper)

Three major components:

Master server: assign tablets(key range), balance tablet servers load, handle schema changes
Tablet server: serve read/write, split tablet when needed
Client: most clients never communicate with Master server, so single-master is not a problem in system load

A write operation consists commit log -> memtable -> SSTable.

The immutable character of SSTable means it need compaction to optimize data. Three type of compactions:

Locality groups: group column families that are always accessed together in same SSTable for read performance. Locality groups can be put into different places(RAM/Disk)
Compression: two-pass custom compression scheme
Caching
Bloom filter: for fewer disk seek
Single commit-log file
Compaction to reduce recovery time
Exploit SSTable immutablity for concurrency

Random writes are better than random reads.

The aggregate performance is growing as cluster scaling, but single machine performance is degrading, especially read/write without RAM cache.

Distributed systems are vulnerable to many types of failures, from network to other dependency modules
Delay adding new featuers until it is clear how the new features will be used
Build proper system-level monitoring
Keep design simple

Every value also has a timestamp, but this timestamp is used to resolve conflict
Persistent level contains commit-log/memtable/SSTable
SSTable compaction for recovery