See how fixed-size chunking vs content-defined chunking handle file modifications.
See in context: Part 1, From Problem to Taxonomy →Step through the Gear rolling hash byte-by-byte. Hash = (hash << 1) + GEAR[byte].
See in context: Part 2, A Deep Dive into FastCDC →Each colored block is one of 256 pre-computed random 32-bit values, keyed by byte. Hover a cell to see its mapping.
The hash rolls forward one byte at a time. When it matches a bit pattern, a chunk boundary is placed. Target chunk size: min 8, avg 16, max 32 bytes.
Drag the slider to adjust the target average chunk size and see how FastCDC re-chunks the same text.
See in context: Part 2, A Deep Dive into FastCDC →See how target average size affects chunk boundaries and size distribution.
Compare how single-mask and dual-mask strategies distribute chunk sizes across the same data.
See in context: Part 2, A Deep Dive into FastCDC →Compare how single-mask and dual-mask strategies distribute chunk sizes across the same data.
Edit text and save versions to see which chunks are new and which are shared.
See in context: Part 3, Deduplication in Action →Click "Save Version" after editing to see which chunks are new and which are shared. Hover over chunks to highlight them across views.
See how average chunk size affects each cost dimension: CPU, memory, network, and storage.
See in context: Part 3, Deduplication in Action →See how per-operation pricing on established object storage providers affects costs when every chunk is a separate object.
See in context: Part 4, CDC in the Cloud →See how container packing reduces API operations costs by bundling chunks into larger objects.
See in context: Part 4, CDC in the Cloud →Explore costs on challenger object storage providers with radically different pricing models.
See in context: Part 5, CDC at Scale on a Budget →Compare costs across all seven storage providers side by side.
See in context: Part 5, CDC at Scale on a Budget →Visualize how skewness affects the popularity distribution of items under a Zipf model.
See in context: Part 5, CDC at Scale on a Budget →Given a skewness level and a target hit rate, how much unique data do you need to cache?
See in context: Part 5, CDC at Scale on a Budget →See how established cache providers (ElastiCache, CloudFront) affect origin costs.
See in context: Part 5, CDC at Scale on a Budget →Compare challenger cache providers that scale linearly with per-request pricing.
See in context: Part 5, CDC at Scale on a Budget →Combine storage provider, cache layer, chunk size, and container packing into a single cost view.
See in context: Part 5, CDC at Scale on a Budget →