An eventually consistent database,
that scales with ease

by Sabine Maennel
at pydata Zürich 25.1.2017


"The problem is that Cassandra’s data model is different enough from that of a traditional database to readily cause confusion"

from Cassandra by example, rackspace.com

  • "map of maps"
  • "a map of maps of maps"
  • "containers that hold collections of column objects"
  • "columns ... as 3-tuples"

Some characteristics


  1. Facebook invented Cassandra
  2. Influenced by Google and Amazon
  3. Today its driven by Apache as an Open Source Project
How I met Cassandra

coming from an RDBMS background ...

RDBMS Cassandra
Query-Language SQL CQL
Container Database Keyspace
Table Table Table
Fields Column Column
Primary Key Primary Key Primary Key

But Casssandra is different ...

Cassandra is a mulitlevel-map rather then a structure

How to think of Cassandra

  1. Cassandra is mostly hosted on servers, that form rings: "clusters"
  2. there is no manager node
  3. they talk via a Gossip-Protocol
A table is distributed

  1. Partitions of the table are mapped to different serves in the ring
  2. the mapping is done by a hashing algorithm
  3. there are replications of each row -> replication-factor
Why replications of data?

Now we understand why a table is distributed

Imagine a query in this distributed system

  1. it does not work!
  2. but, some rows are closer then others ...
partitions are cluster of rows

the primary key has two parts:

rows contain maps rather than columns

columns in cassandra

Columns are maps

  1. they consist of key-value pairs
  2. they come with a timestamp
  3. they may even expire
remember ...

What does eventually consistent mean?

-> look at how Cassandra reads and writes

Cassandra is consistent if ...

Lets look at an example Twissandra

  • Twissandra (Java)
  • >cassandra to start cassandra
    >cqlsh in a different terminal to start CQL
            cqlsh> CREATE KEYSPACE IF NOT EXISTS twissandra 
                   WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
            cqlsh> DESCRIBE twissandra;
            CREATE KEYSPACE twissandra WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

    Now we have to use that keyspace

            cqlsh> USE twissandra; 

    We are ready to create the first table:

            cqlsh> CREATE TABLE users (
                   username text PRIMARY KEY,
                   password text);

    Look at your table

            cqlsh> DESCRIBE users; 
            CREATE TABLE twissandra.users (
                username text PRIMARY KEY,
                password text
            ) WITH bloom_filter_fp_chance = 0.01
                AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
                AND comment = ''
                AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
                AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
                AND crc_check_chance = 1.0
                AND dclocal_read_repair_chance = 0.1
                AND default_time_to_live = 0
                AND gc_grace_seconds = 864000
                AND max_index_interval = 2048
                AND memtable_flush_period_in_ms = 0
                AND min_index_interval = 128
                AND read_repair_chance = 0.0
                AND speculative_retry = '99PERCENTILE';        

    There are a lots of defaults in place ...

    Datamodelling in Cassandra: thinking in queries

    Following and Followers

                -- "username" follows "followed"
                CREATE TABLE following (
                    username text,
                    followed text,
                    PRIMARY KEY(username, followed)
                -- "username" is followed by "following"
                CREATE TABLE followers (
                    username  text,
                    following text,
                    PRIMARY KEY(username, following)

    Tweets and Userline

                 CREATE TABLE tweets (
                    tweetid uuid PRIMARY KEY, 
                    username text, 
                    body text
                CREATE TABLE userline (
                    tweetid  timeuuid,
                    username text,
                    body     text,
                    PRIMARY KEY(username, tweetid)


                CREATE TABLE timeline (
                    username  text,
                    tweetid   timeuuid,
                    posted_by text,
                    body      text,
                    PRIMARY KEY(username, tweetid)

    Live Demo on a virtual machine

    Twissandra sample data

