<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dave Gardner - PHP Developer &#187; Cassandra</title>
	<atom:link href="http://www.davegardner.me.uk/blog/feed/?tag=cassandra" rel="self" type="application/rss+xml" />
	<link>http://www.davegardner.me.uk/blog</link>
	<description>Just behind the bleeding edge of PHP.</description>
	<lastBuildDate>Tue, 06 Nov 2012 12:06:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Cassandra + Hadoop = Brisk</title>
		<link>http://www.davegardner.me.uk/blog/2011/05/17/cassandra-hadoop-brisk/</link>
		<comments>http://www.davegardner.me.uk/blog/2011/05/17/cassandra-hadoop-brisk/#comments</comments>
		<pubDate>Tue, 17 May 2011 15:42:38 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[brisk]]></category>
		<category><![CDATA[datastax]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=182</guid>
		<description><![CDATA[Slides and details from my talk at Cassandra London on DataStax's Brisk - a distribution of Hadoop, Hive and Cassandra that is suitable for workloads of real-time access plus batch analytics.]]></description>
			<content:encoded><![CDATA[<p>The Cassandra London meetup group has recently celebrated its six month anniversary and after a string of fantastic speakers it was left to me to follow up my talk at the <a href="http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/">first ever meetup</a> with another talk.</p>
<p>I decided to start out giving a brief history of my work with Cassandra; starting when I joined VisualDNA, through the hard times struggling with GC issues up until the present &#8211; successfully running a 16 node Cassandra cluster on EC2. Looking back, working with Cassandra has been a very positive experience, but the analytics side of things (carrying out complex analysis of data stored in Cassandra) seemed harder than it could be.</p>
<p>This is where <a href="http://www.datastax.com/products/brisk" target="_blank">DataStax&#8217;s Brisk</a> comes into play.</p>
<blockquote><p>DataStax’ Brisk is an enhanced open-source Apache Hadoop and Hive distribution that utilizes Apache Cassandra for many of its core services.</p></blockquote>
<p>Put simply, Brisk gives you the real-time capabilities of Cassandra combined with an easy interface to Map Reduce via Hive, in an easy to use bundle.</p>
<h3>Case study &#8211; segmenting users</h3>
<p>As a case study I built <a href="https://github.com/davegardnerisme/we-have-your-kidneys" target="_blank">a very simple system for segmenting users into buckets using PHP</a>. The key idea is to have a pixel that can be included on a website to track users (via a Cookie) and put them into various buckets. This demonstrates the key features of Brisk:</p>
<p><strong>Real-time API access</strong></p>
<ul>
<li><a target="_blank" href="http://wehaveyourkidneys.com/show.php">An API to show a user which segments they are in</a></li>
<li><a target="_blank" href="http://wehaveyourkidneys.com/add.php?segment=bloggage&amp;expires=2592000">An API to add a user to a segment</a></li>
</ul>
<p><strong>Batch analytics</strong></p>
<ul>
<li>A Hive query to find out how many users are in each segment</li>
<li>A Hive query to calculate the average and standard deviation of the number of groups that each user is part of</li>
</ul>
<h3>The talk</h3>
<p><a href="http://skillsmatter.com/podcast/nosql/cassandra-may-meetup/js-1775" target="_blank">You can watch a podcast of the talk on the SkillsMatter website.</a></p>
<div style="width:425px" id="__ss_7992824"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/davegardnerisme/cassandra-hadoop-brisk" title="Cassandra + Hadoop = Brisk">Cassandra + Hadoop = Brisk</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/7992824" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
<div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/davegardnerisme">Dave Gardner</a> </div>
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2011/05/17/cassandra-hadoop-brisk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running Cassandra on EC2</title>
		<link>http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/</link>
		<comments>http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/#comments</comments>
		<pubDate>Wed, 01 Dec 2010 12:05:00 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[netflix]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=153</guid>
		<description><![CDATA[This talk covers the advantages and disadvantages of running Cassandra on EC2 and includes some I/O benchmarks - including some excellent work from Corey Hulen. There is also a basic overview of what actually happens when Cassandra reads and writes (although this is simplified to a single node).]]></description>
			<content:encoded><![CDATA[<p>As the founder of <a href="http://www.meetup.com/Cassandra-London/" target="_blank">Cassandra London</a> it was left to me to provide the first talk; hopefully this won&#8217;t be necessary <em>every</em> month! To kick things off I talked about running Cassandra on Amazon EC2. At <a href="http://www.visualdna.com/" target="_blank">VisualDNA</a> we run a production cluster on EC2; but this hasn&#8217;t been without its difficulties!</p>
<p>This talk covers the advantages and disadvantages of running Cassandra on EC2 and includes some I/O benchmarks &#8211; including some <a href="http://www.coreyhulen.org/?p=326" target="_blank">excellent work from Corey Hulen</a>. There is also a basic overview of what actually happens when Cassandra reads and writes (although this is simplified to a single node).</p>
<p>The main reason that EC2 could be problematic really comes down to I/O performance, and perhaps more importantly the predictability of I/O performance. This aside, there are many reasons why you may want to use EC2. <a href="http://cloudscaling.com/blog/cloud-computing/cloud-innovators-netflix-strategy-reflects-google-philosophy?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%3A+neoTactics+%28Cloudscaling%29">This interview with Adrian Cockcroft</a> looks at why Netflix chose to go down the EC2 route and is a recommended read.</p>
<div style="width:425px" id="__ss_5794808"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/davegardnerisme/running-cassandra-on-amazon-ec2" title="Running Cassandra on Amazon EC2">Running Cassandra on Amazon EC2</a></strong><object id="__sse5794808" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassldn-101116044342-phpapp01&#038;stripped_title=running-cassandra-on-amazon-ec2&#038;userName=davegardnerisme" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse5794808" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassldn-101116044342-phpapp01&#038;stripped_title=running-cassandra-on-amazon-ec2&#038;userName=davegardnerisme" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/davegardnerisme">Dave Gardner</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cassandra: replication and consistency</title>
		<link>http://www.davegardner.me.uk/blog/2010/10/07/cassandra-replication-and-consistency/</link>
		<comments>http://www.davegardner.me.uk/blog/2010/10/07/cassandra-replication-and-consistency/#comments</comments>
		<pubDate>Thu, 07 Oct 2010 08:45:54 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[consistency]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=139</guid>
		<description><![CDATA[Cassandra can be an unforgiving beast if you don't know what you're doing. I have first hand experience of this! My advice: learn everything you can. This is a good introduction to replication and consistency in Cassandra.]]></description>
			<content:encoded><![CDATA[<p>Cassandra can be an unforgiving beast if you don&#8217;t know what you&#8217;re doing. I have first hand experience of this! My advice: learn everything you can. This is a good introduction to <strong>replication</strong> and <strong>consistency</strong> in Cassandra.</p>
<div id="__ss_3903952" style="width: 425px;"><strong style="display:block;margin:12px 0 4px"><a title="Introduction to Cassandra: Replication and Consistency" href="http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency">Introduction to Cassandra: Replication and Consistency</a></strong><object id="__sse3903952" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandra2010-04-27-100429121250-phpapp02&amp;stripped_title=introduction-to-cassandra-replication-and-consistency&amp;userName=benjaminblack" /><param name="name" value="__sse3903952" /><param name="allowfullscreen" value="true" /><embed id="__sse3903952" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandra2010-04-27-100429121250-phpapp02&amp;stripped_title=introduction-to-cassandra-replication-and-consistency&amp;userName=benjaminblack" name="__sse3903952" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/benjaminblack">Benjamin Black</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2010/10/07/cassandra-replication-and-consistency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP and Cassandra</title>
		<link>http://www.davegardner.me.uk/blog/2010/07/02/php-and-cassandra/</link>
		<comments>http://www.davegardner.me.uk/blog/2010/07/02/php-and-cassandra/#comments</comments>
		<pubDate>Fri, 02 Jul 2010 08:25:35 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[phplondon]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=105</guid>
		<description><![CDATA[Yesterday (1st July) I presented for the first time at the PHP London user group. It was a gentle introduction; a five minute &#8220;lightening&#8221; talk slot. I spoke about Cassandra, giving a short introduction to using it with PHP.

To summarise my main points from the talk (perhaps something I should have done in the talk!)

Cassandra [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday (1st July) I presented for the first time at the <a href="http://www.phplondon.org/" target="_blank">PHP London user group</a>. It was a gentle introduction; a five minute &#8220;lightening&#8221; talk slot. I spoke about <a href="http://cassandra.apache.org/" target="_blank">Cassandra</a>, giving a short introduction to using it with PHP.</p>
<div id="__ss_4664596" style="margin: 20px 0pt; width: 425px;"><object id="__sse4664596" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandphp-phplondon-100702015458-phpapp02&amp;stripped_title=php-and-cassandra" /><param name="name" value="__sse4664596" /><param name="allowfullscreen" value="true" /><embed id="__sse4664596" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandphp-phplondon-100702015458-phpapp02&amp;stripped_title=php-and-cassandra" name="__sse4664596" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>To summarise my main points from the talk (perhaps something I should have done <em>in</em> the talk!)</p>
<ul>
<li>Cassandra is a &#8220;highly scalable second-generation distributed database&#8221;</li>
<li>It can be considered a schema-less database insofar that each row can have different columns</li>
<li>Cassandra is designed to be both fault tolerant and horizontally scalable &#8211; both read and write throughput go up linearly as more boxes are added to the cluster</li>
<li>I think the best way of accessing Cassandra from PHP is directly via the <a href="http://wiki.apache.org/cassandra/API" target="_blank">Thrift API</a>. This allows a beginner to learn about the core functionality of Cassandra including its limitations</li>
<li>Cassandra has Hadoop support which means that Hadoop Map Reduce jobs (a scalable, distributed mechanism for processing data) can read and write to Cassandra*</li>
<li>Cassandra does not have any query language (as opposed to MySQL or <a href="http://www.mongodb.org/" target="_blank">MongoDB</a> which both allow you to query data in different ways)</li>
<li>When designing your data model, I think its easiest to try to forget about SQL and concentrate on how Cassandra works (don&#8217;t design a relational schema and then &#8220;port&#8221; it over)</li>
</ul>
<p>* As of version 0.7!</p>
<p>Overall, I think Cassandra is a very useful tool. Whether it fits your use case or not is another matter!</p>
<p>If you&#8217;re interested in learning more about using Cassandra in a PHP project, I recommend the following starting points:</p>
<ol>
<li>Using Cassandra with PHP<br />
<a href="https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP" target="_blank">https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP</a></li>
<li>WTF is a SuperColumn? An Intro to the Cassandra Data Model<br />
<a href="http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model" target="_blank">http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2010/07/02/php-and-cassandra/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
