Skip to content

Why I favor Cassandra

June 23, 2012
Just stumbled on this nice article on cassandra column access by Aaron Morton: http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
 
This is reason enough for noting once and for all why I like Cassandra.
  • Scales linearly (with RandomPartitioner – you need to build your own indexes)
  • No single point of failure – especially also no Master/Slave setup and ops troubles
  • Allows sorted buckets, big sorted buckets – good for building indexes. (give me newest 10 messages, range scan…) Why create a dep on something like elasticsearch or lucene if you can have it in your db?

To contrast with others:

  • Mongo: 
    • Single point of failure: Yes, Master can loose writes. Default setup doesn’t have a commit log. Commit log will kill perf
    • Sorting: Yes
    • Scales: If shard key is chosen wisely and still you have to build your own indexes sooner or later too. Problem with locking – some long running Queries with interleaved writes can kill performance; even lead to a failover. 2.2 improving, but still not there yet.
  • Redis: Has sorted buckets, but not clear how performant inserts or reads in the middle of a sorted list are. Master/Slave replication only.
  • RIAK: No support for sorted data. Makes maintaining a time-index of objects for example a potential performance/scale killer. No single point of failure. Good scaling for data without sorting requirements
  • CouchDB: Nice for append-only data and storing for batch processing. No single point of failure. Not a good idea for data that is updated frequently. Unsure about sorting. Non-update stopped my investigation.
  • (HBase: Very complex. Wants to do everything out of the box and still had some stability issues when I checked.)

Cassandra is currently having a hard time – CQL is not ready yet (v3 needed for recommended model of composite column names and libraries missing) and therefore users and especially new adopters have a hard time choosing a solution.

But I think Cassandra has the most interesting feature set and fit for heavily distributed workloads – a sound value proposition. It forces you to really make the switch to distributed concepts, just like couch. Makes you take difficult decisions and do away with non-scalable and non-fault-tolerant practises early, when it’s still cheap.

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s