Evolution of Google FS Talk
Google - Larry Greenfield lfg@google?
Evolution of Google FS
- Storage Run in datacenters
- petabyte of free space?
- gfs
- 2002
- location indep namespace
- 100’s of tb - scaled to 10s of PB
- userspace, no POSIX semantics (not atomics)
- reading/writing file at same time not a great idea
- simple design, good for large batch work
- GFS Masters hold state of filesystem
- 3 way chunking replication
- Files <-> Chunk Mapping
- Clients talk to GFS master first, then data next
- Only one GFS master (primary) and backups
- GFS Write:
- Open includes location of first/last chunk
- FindLockHolder if additional chunks are needed
- Buffer sends data to chunk server
- Write commits buffers to disk, Chunk server sends things to chunks
- This daisy chaining is great for networks
- Accelerations
- shadow masters - read only lagging replicas
- multiple GFS cells per chunk server (scale metadata - manual sharding)
- automatic master election / consistent replication
- archival reed-solomon encoding
- must be first written replicated
- might have long pauses when reading
- From internal GFS 101 Slides: What is GFS Bad for
- Predictable performance (No guarantee of latency, no operation timeouts)
- Structured Data (GFS is filename -> blob data store)
- Random Writes (Optimized for parallel append)
- High Availability (fault tolerant not HA)
- architectural problems
- GFS Master:
- One machine not large enough for a large FS
- Single bottleneck for metadata ops
- GFS Semantics
- Unclear state of files when not consensus? (missed slide)
- GFS Master:
- Some GFS v2 Goals
- Bigger, faster, predictable performance, predictable tail latency
- GFS Master replaced by Colossus
- GFS chunkserver replaced by D
- Solve an easier problem : Optimize for bigtable
- File system for bigtable
- append ony
- single writer
- rename only to indicate finished file
- no snapshots
- directories unnecessary
- Where to put metadata?
- Storage options back then
- GFS-
- BIgtable (sorted kv store on GFS)
- Sharded MySQL with local disk & replication
- ads db
- Local kv store with Paxos Replication
- Authentication DB
- Chubby
- Metadata in Bigtable (!?)
- GFS master -> CFS
- CFS Curators are bigtable coprocessors
- bigtable row corresponds to a single file
- stripes are replication groups: open, closed, finalized (replication unit)
And then I had to leave for an interview :(
Written on November 23, 2015