Paper:Super-scalar RAM-CPU cache compression

TODO: Create a Stub-like template which suggests people read the Corresponding Talk page as well.

TODO: Create an infobox for a publication (or just an article) and apply it to this paper.

A paper about the lightweight compression schemes used in Actian Vector (then MonetDB/X100), which allow also for adaptivity of compression scheme, pipeline-effective compression and decompression and other useful features.

WRITEME: Describe context for authoring this.

Take-home messages[]

Don't store raw DB data on disk, store the compressed form.
Don't decompress entire pages into memory; only decompress small working sets into CPU cache.
Don't compress an entire column; compress chunks of it independently to: 1. avoid global dictionary overflow. 2. Adapt compression to local features.
Using exception-patching allows: 1. Accounting for distribution outliers 2. Decoding in tight loops with no branching
Fast compression is also useful, not just fast decompression.
With appropriate compression schemes, can hold as much as x25 as high TPC-H scale factors in memory.
The exceptional values mechanism is usable as a skip-list into the compressed data.
Compression schemes should (and can) allow for random-access by index into the compressed data.
Sampling can be used to choose a compression scheme for a chunk of column data.

Concepts discussed[]

TODO: Create a glossary template instead of using plain lists here.

DBMS data Compression schemes:

FOR: Frame of reference; encode difference to constant value)
DICT: Dictionary; encode indices into a list-of-values)
DELTA: Differences; encode current value minus previous value
PFOR: Patched FOR - like FOR, but with the decode result 'patched' with an exceptions pass
PDICT: Patched DICT (see PFOR)
PDELTA: Patched DELTA

DBMSes discussed[]

The compression schemes known to be used (at the time, 2008) in some DBMSes were mentioned.

IBM DB2: Drops pointer prefixes in B-trees
Teradata: Dictionary compression for columns
Oracle: Dictionary compression for disk storage blocks
Sybase IQ: Multi-scheme compression, each 'page' compressed separately with its own scheme

Paper:Super-scalar RAM-CPU cache compression

Take-home messages[]

Concepts discussed[]

DBMSes discussed[]

Fan Feed