Yesterday (1st July) I presented for the first time at the PHP London user group. It was a gentle introduction; a five minute “lightening” talk slot. I spoke about Cassandra, giving a short introduction to using it with PHP.
To summarise my main points from the talk (perhaps something I should have done in the talk!)
- Cassandra is a “highly scalable second-generation distributed database”
- It can be considered a schema-less database insofar that each row can have different columns
- Cassandra is designed to be both fault tolerant and horizontally scalable – both read and write throughput go up linearly as more boxes are added to the cluster
- I think the best way of accessing Cassandra from PHP is directly via the Thrift API. This allows a beginner to learn about the core functionality of Cassandra including its limitations
- Cassandra has Hadoop support which means that Hadoop Map Reduce jobs (a scalable, distributed mechanism for processing data) can read and write to Cassandra*
- Cassandra does not have any query language (as opposed to MySQL or MongoDB which both allow you to query data in different ways)
- When designing your data model, I think its easiest to try to forget about SQL and concentrate on how Cassandra works (don’t design a relational schema and then “port” it over)
* As of version 0.7!
Overall, I think Cassandra is a very useful tool. Whether it fits your use case or not is another matter!
If you’re interested in learning more about using Cassandra in a PHP project, I recommend the following starting points:
- Using Cassandra with PHP
https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP - WTF is a SuperColumn? An Intro to the Cassandra Data Model
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model


