PHP and Cassandra

Yesterday (1st July) I presented for the first time at the PHP London user group. It was a gentle introduction; a five minute “lightening” talk slot. I spoke about Cassandra, giving a short introduction to using it with PHP.

To summarise my main points from the talk (perhaps something I should have done in the talk!)

  • Cassandra is a “highly scalable second-generation distributed database”
  • It can be considered a schema-less database insofar that each row can have different columns
  • Cassandra is designed to be both fault tolerant and horizontally scalable – both read and write throughput go up linearly as more boxes are added to the cluster
  • I think the best way of accessing Cassandra from PHP is directly via the Thrift API. This allows a beginner to learn about the core functionality of Cassandra including its limitations
  • Cassandra has Hadoop support which means that Hadoop Map Reduce jobs (a scalable, distributed mechanism for processing data) can read and write to Cassandra*
  • Cassandra does not have any query language (as opposed to MySQL or MongoDB which both allow you to query data in different ways)
  • When designing your data model, I think its easiest to try to forget about SQL and concentrate on how Cassandra works (don’t design a relational schema and then “port” it over)

* As of version 0.7!

Overall, I think Cassandra is a very useful tool. Whether it fits your use case or not is another matter!

If you’re interested in learning more about using Cassandra in a PHP project, I recommend the following starting points:

  1. Using Cassandra with PHP
    https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
  2. WTF is a SuperColumn? An Intro to the Cassandra Data Model
    http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

Tags: , , ,

Leave a Reply