<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dave Gardner - PHP Developer</title>
	<atom:link href="http://www.davegardner.me.uk/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.davegardner.me.uk/blog</link>
	<description>Just behind the bleeding edge of PHP.</description>
	<lastBuildDate>Tue, 06 Nov 2012 12:06:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Stream de-duplication</title>
		<link>http://www.davegardner.me.uk/blog/2012/11/06/stream-de-duplication/</link>
		<comments>http://www.davegardner.me.uk/blog/2012/11/06/stream-de-duplication/#comments</comments>
		<pubDate>Tue, 06 Nov 2012 11:56:50 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[bloomfilter]]></category>
		<category><![CDATA[nsq]]></category>
		<category><![CDATA[oppositeofabloomfilter]]></category>
		<category><![CDATA[stream]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=240</guid>
		<description><![CDATA[A post on stream de-duplication using a memory bounded lossy hash map (opposite of a Bloom filter). Includes some notes on NSQ and how why this technique is needed, plus a PHP implementation.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve recently started playing around with <a href="https://github.com/bitly/nsq" target="_blank">NSQ</a>, in my hunt for a good, resilient, highly available, guaranteed “at least once” delivery queue. That’s a lot of adjectives, but basically it boils down to a queue that puts a copy of messages on N nodes and is able to operate (without losing messages) with any X of them failing, obviously where X &lt; N.</p>
<p>NSQ attacks this problem in an interesting way. Instead of trying to form a cluster (in the sense that say RabbitMQ does), it instead treats each <strong>`nsqd`</strong> instance as a separate entity. It is only the clients that know there is more than one of them, and the directory service <strong>`nsqlookupd`</strong>. This actually makes it very reliable, in the sense that there are no troublesome master/slave relationships to preserve or leaders to elect.</p>
<p>This simplicity forces some of the work back on the client.</p>
<ul>
<li>NSQ is guaranteed “at least once”, rather than “exactly once”; hence subscribers should operate in an idempotent way</li>
<li>when using with replication, it is up to the client to de-duplicate the messages on subscription</li>
</ul>
<h3 id="deduplication">Deduplication</h3>
<p>To de-duplicate, a subscriber needs to determine if it has seen a message before. Doing so in an accurate way would involve storing all the message IDs or some digest of the message itself in a large hash table. With this we could simply test:</p>
<pre class="code">if (message is in hash map) {
    ignore
}
process</pre>
<p>Then we just need to make sure we add messages seen to the hash map. With a lossless hash map (eg: store everything), this is going to use unbounded memory.</p>
<h3 id="oppositeofbloomfilter">The opposite of a Bloom filter</h3>
<p>Bloom filters were my first thought when trying to come up with a way of bounding memory. Bloom filters are a probabilistic data structure that is able to test if some element is a member of a set. A Bloom filter will never tell you an item is in the set if it isn&#8217;t (no false negatives), but may tell you it is in the set when really it isn’t (chance of false positives).</p>
<p>What I actually want is _the opposite_ of a Bloom filter.</p>
<p><a href="http://lmgtfy.com/?q=opposite+of+a+bloom+filter" target="_blank">http://lmgtfy.com/?q=opposite+of+a+bloom+filter</a></p>
<p>So picking the first link on Google, I checked out the <a href="http://somethingsimilar.com/2012/05/21/the-opposite-of-a-bloom-filter/" target="_blank">blog post on somethingsimilar.com</a>. <a href="https://twitter.com/jmhodges" target="_blank">@jmhodges</a>’s solution is simple; use a fixed-size hash map and then simply overwrite entries on collision. Let’s go through that slowly.</p>
<p>Here’s our hash map, with 10 buckets:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/buckets01.png"><img class="aligncenter size-full wp-image-241" title="Our empty hash buckets" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/buckets01.png" alt="Our empty hash buckets" width="599" height="97" /></a></p>
<p>Now we process our first message and add it:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/buckets02.png"><img class="aligncenter size-full wp-image-242" title="Our content hashes to bucket 3" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/buckets02.png" alt="Our content hashes to bucket 3" width="586" height="165" /></a></p>
<p>To test if some new message has been seen we need to check whether we have got <strong>exactly this message content</strong> within the appropriate bucket. If the content does match, then we can be sure we’ve seen it. If the content does not match, then we cannot know. The reason is that we may have <em>just</em> overwritten this message with a new message that collided into the same bucket.</p>
<p>So now we write in our next message, and it hashes to the same bucket. At this point we&#8217;ve lost our knowledge of having ever seen the first message we processed.</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/buckets03.png"><img class="aligncenter size-full wp-image-243" title="Our next item also hashes to bucket 3; now we have lost knowledge of having seen the previous item" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/buckets03.png" alt="Our next item also hashes to bucket 3; now we have lost knowledge of having seen the previous item" width="586" height="174" /></a></p>
<h3 id="decidinghowbigtomakeit">Deciding how big to make it</h3>
<p>So with this data structure, we will lose knowledge of messages we have seen; however we can determine how quickly this happens by choosing the size of our hash map (how many buckets we have).</p>
<p>Intuitively, there is a trade off between the amount of space used and our ability to detect duplications. At one extreme, with 1 bucket, we can only ever de-duplicate if we receive messages in order. At the other extreme, with a huge number of buckets, we can <em>nearly always</em> de-duplicate (we are bounded by our hash function’s ability to determine unique values for different content).</p>
<p>To get a clearer picture, we can consider our implementation in terms of probability. Starting with a single message stored, the probability of overwriting this message with the next message (assuming a perfectly random hash function), is 1/N, where N is the number of buckets.</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation03.png"><img class="aligncenter size-full wp-image-249" title="First iteration; chance of removing knowledge of some previously processed message" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation03.png" alt="First iteration; chance of removing knowledge of some previously processed message" width="100" height="77" /></a></p>
<p>On our next go, the chances of us overwriting <em>on this go</em> is:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation04.png"><img class="aligncenter size-full wp-image-250" title="Probability of overwriting on _exactly the second go_" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation04.png" alt="Probability of overwriting on _exactly the second go_" width="193" height="80" /></a></p>
<p>This combines the probability of us _not_ having overwritten on the first go with the probability of overwriting this time. To get the probability of us having overwritten <em>by this go</em>, we simply add up:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation05.png"><img class="aligncenter size-full wp-image-251" title="Probability of having overwritten _by the second go_" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation05.png" alt="Probability of having overwritten _by the second go_" width="238" height="88" /></a></p>
<p>Our next go looks like this:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation06.png"><img class="aligncenter size-full wp-image-252" title="Probability of overwriting by the third go!" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation06.png" alt="Probability of overwriting by the third go!" width="569" height="84" /></a></p>
<p>And we can express this as a sum, for any given x (where x is the number of additional messages we&#8217;ve written into our hash map):</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation01.png"><img class="aligncenter size-full wp-image-254" title="Probability of having lost some initial message after X goes, with N buckets." src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation01.png" alt="Probability of having lost some initial message after X goes, with N buckets." width="229" height="87" /></a></p>
<p>Plotting this, for N=100, we get:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/forn100.png"><img class="aligncenter size-full wp-image-262" title="Probability of having overwritten a previously stored message after X further messages processed (x axis) for N=100" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/forn100.png" alt="Probability of having overwritten a previously stored message after X further messages processed (x axis) for N=100" width="613" height="461" /></a></p>
<p>So what we are saying here is that with 100 buckets, after adding 459 additional messages, we are 99% certain to have overwritten our initial message and hence 99% certain that we won’t be able to de-duplicate this message if it turned up again.</p>
<p>We can work out the equation of this graph:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation02.png"><img class="aligncenter size-full wp-image-255" title="Solved" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/sd-equation02.png" alt="Solved" width="247" height="88" /></a></p>
<p>We can visualise this as it varies with both N and X:</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/surfaceplot.png"><img class="aligncenter size-full wp-image-256" title="Surface plot of equation as it varies in X and N" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/11/surfaceplot.png" alt="Surface plot of equation as it varies in X and N" width="635" height="466" /></a></p>
<p>So if we want to be able to de-duplicate (to 90% chance) a stream running at 1,000 events per second, with an hour delay (y = 0.9, x = 1000*60*60):</p>
<pre class="code">0.9 = 1 - (1-1/N) ^ 3600000
0.1 = (1-1/N) ^ 3600000
0.999999360393234 = 1-1/N
1 / N = 0.000000639606766</pre>
<p>So N = <strong>1,563,461</strong></p>
<h3 id="nsqphp">NSQPHP implementation</h3>
<p>The @jmhodges implementation of opposite of a Bloom filter has an atomic “check and set” to test membership. <strong>nsqphp</strong> ships with two implementations which implement the same basic interface. The <a href="https://github.com/davegardnerisme/nsqphp/blob/master/src/nsqphp/Dedupe/OppositeOfBloomFilter.php" target="_blank">first implementation</a> runs in a single process (and hence doesn&#8217;t have to worry about this anyway &#8211; due to PHP&#8217;s lack of threads).</p>
<p><script src="http://gist-it.appspot.com/github/davegardnerisme/nsqphp/raw/master/src/nsqphp/Dedupe/OppositeOfBloomFilter.php#L56-85"></script></p>
<p>In this implementation I’m actually using an MD5 of the entire content, to save space. This introduces a theoretical possibility that I could give a false negative (saying it’s seen a message when it hasn’t).</p>
<p>The <a href="https://github.com/davegardnerisme/nsqphp/blob/master/src/nsqphp/Dedupe/OppositeOfBloomFilterMemcached.php" target="_blank">second implementation</a> uses Memcached to store the actual hash map; this completely ignores races on the basis that they will only mean we may not quite de-duplicate as many messages as we could have.</p>
<p>The only other complication is with failed messages; here we need to <em>erase</em> our knowledge of having “seen” a message. To achieve this we simply update our hash map so that the message we’re interested in is no longer the content within the hash bucket:</p>
<pre class="code">if (entry at hash index foo is our content) {
    overwrite with some placeholder (eg: "deleted")
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2012/11/06/stream-de-duplication/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP Deployment with Capistrano</title>
		<link>http://www.davegardner.me.uk/blog/2012/02/13/php-deployment-with-capistrano/</link>
		<comments>http://www.davegardner.me.uk/blog/2012/02/13/php-deployment-with-capistrano/#comments</comments>
		<pubDate>Mon, 13 Feb 2012 15:44:33 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Dev Environment]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[capistrano]]></category>
		<category><![CDATA[deploy]]></category>
		<category><![CDATA[deployment]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=222</guid>
		<description><![CDATA[Capistrano is a developer tool for deploying web applications. This post explains how to use Capistrano to deploy PHP projects.]]></description>
			<content:encoded><![CDATA[<p>This blog post explains how to use <a target="_blank" href="https://github.com/capistrano/capistrano/wiki/">Capistrano</a> to deploy PHP, including some tips for integration with Jenkins.</p>
<ul>
<li><a href="#capistrano-background">Background</a></li>
<li><a href="#capistrano-what-is-it">What is Capistrano?</a></li>
<li><a href="#capistrano-getting-started">Getting started</a></li>
<li><a href="#capistrano-for-php">Taking it over for PHP</a></li>
<li><a href="#capistrano-multi-stage">Multi-stage deployments</a></li>
<li><a href="#capistrano-tag-selection">Tag selection</a></li>
<li><a href="#capistrano-putting-it-all-together">Putting it all together</a></li>
</ul>
<h2 id="capistrano-background">Background</h2>
<p>Back when I worked at Imagini, we used the home-baked <del style="text-decoration: line-through;">Cruftly</del> Cloudly deployment system (built by <a target="_blank" href="http://twitter.com/pikesley">@pikesley</a>) to roll releases. It had some nice features:</p>
<ul>
<li>new code had to be pushed to staging first (as a release candidate)</li>
<li>final deploys were always against the exact release candidate</li>
<li>it would release to a bunch of machines in one hit, with symlinks switched at the end</li>
<li>from a developer perspective, it was &#8220;push button&#8221; (we just ran a script that requested a deploy of our code)</li>
<li>it presented an ASCII art dinosaur upon successful release</li>
</ul>
<p>Once I&#8217;d moved on I was keen to recreate the Cloudly experience, but I didn&#8217;t want to have to hand-craft my own solution. Luckily Capistrano now exists which provides a very handy tool to deploy code from SCM to one or more servers.</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/02/cruft.jpg"><img class="aligncenter size-full wp-image-224" title="Cruftosaurus" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2012/02/cruft.jpg" alt="Cruftosaurus" width="590" height="257" /></a></p>
<h2 id="capistrano-what-is-it">What is Capistrano?</h2>
<blockquote><p>Capistrano is a developer tool for deploying web applications&rdquo;</p></blockquote>
<p><a target="_blank" href="https://github.com/capistrano/capistrano/wiki/">Capistrano</a> is written in Ruby and offers up a basic DSL from which you can craft quite flexible deployment scripts. Typically, the deploy process would be to deploy a particular version (branch, commit etc.) from SCM to one or more boxes (into a new folder on the server), then switch a symlink so that they all immediately run off the new code. Support for instant rollback is provided. That said, it&#8217;s very flexible. In my current setup I have it deploying to multiple environments (dev, staging, production), building code (think Phing), running tests on the servers before finalising the deploy and then restarting worker processes on completion.</p>
<p>All of this functionality is driven from a simple command line interface:</p>
<pre class="code">cap deploy
cap deploy:rollback</pre>
<p>We can list all the available commands with:</p>
<pre class="code">cap -T
cap deploy               # Deploys your project.
cap deploy:check         # Test deployment dependencies.
cap deploy:cleanup       # Clean up old releases.
cap deploy:cold          # Deploys and starts a `cold' application.
cap deploy:migrations    # Deploy and run pending migrations.
cap deploy:pending       # Displays the commits since your last deploy.
cap deploy:pending:diff  # Displays the `diff' since your last deploy.
cap deploy:rollback      # Rolls back to a previous version and restarts.
cap deploy:rollback:code # Rolls back to the previously deployed version.
cap deploy:start         # Blank task exists as a hook into which to install ...
cap deploy:stop          # Blank task exists as a hook into which to install ...
cap deploy:symlink       # Updates the symlink to the most recently deployed ...
cap deploy:update        # Copies your project and updates the symlink.
cap deploy:update_code   # Copies your project to the remote servers.
cap deploy:upload        # Copy files to the currently deployed version.
cap deploy:web:disable   # Present a maintenance page to visitors.
cap deploy:web:enable    # Makes the application web-accessible again.
cap invoke               # Invoke a single command on the remote servers.
cap shell                # Begin an interactive Capistrano session.

Some tasks were not listed, either because they have no description,
or because they are only used internally by other tasks. To see all
tasks, type `cap -vT'.</pre>
<h2 id="capistrano-getting-started">Getting started</h2>
<p>Step 1 &#8211; install capistrano.</p>
<pre class="code">apt-get install capistrano</pre>
<p>Step 2 &#8211; &#8220;capify&#8221; one of your projects</p>
<pre class="code">cd /my/project/location
capify</pre>
<p>This step creates the files <strong>Capfile</strong> and <strong>config/deploy.rb</strong>.</p>
<h2 id="capistrano-for-php">Taking it over for PHP</h2>
<p>Capistrano is pretty simple, these are the basics:</p>
<ul>
<li>Capistrano runs on your local machine (it&#8217;s not a server-side thing)</li>
<li>A recipe is a bunch of named tasks that combine to define your deploy process</li>
<li>The “default” process is tailored for releasing Rails applications &#8211; therefore you&#8217;ll have to customise the recipes for PHP</li>
<li>Capistrano is built around the concept of roles, but for a simple PHP setup you can just have a “web” role and think of this as meaning “where I&#8217;m going to deploy to”</li>
</ul>
<p>To learn more about the Capistrano deployment process, I found the following useful:</p>
<ul>
<li><a target="_blank" href="https://github.com/mpasternacki/capistrano-documentation-support-files/raw/master/default-execution-path/Capistrano%20Execution%20Path.jpg">This diagram gives a good overview of the deployment process</a> (which tasks are run when you do a deploy)</li>
<li><a target="_blank" href="https://github.com/capistrano/capistrano/wiki/2.x-From-The-Beginning">The wiki</a> is a good place to go</li>
<li><a target="_blank" href="https://github.com/capistrano/capistrano/blob/master/lib/capistrano/recipes/deploy.rb">Reading the code</a> is actually very helpful</li>
</ul>
<p>So back to PHP.. I followed <a target="_blank" href="https://github.com/namics/capistrano-php/blob/master/lib/capistrano/php.rb">these directions</a> (from the aptly named &#8220;Capistrano PHP&#8221; project). All we are doing is overriding the <a target="_blank" href="https://github.com/capistrano/capistrano/blob/master/lib/capistrano/recipes/deploy.rb#L226-230">finalise update</a> and <a target="_blank" href="https://github.com/capistrano/capistrano/blob/master/lib/capistrano/recipes/deploy.rb#L363-371">migrate</a> tasks. The <a target="_blank" href="https://github.com/capistrano/capistrano/blob/master/lib/capistrano/recipes/deploy.rb#L309-311">restart task</a> is actually blank by default, so we only need to define it if we want it to do something. We&#8217;ll make it reload nginx (I symlink the nginx config from the deployed code).</p>
<pre class="code">## php cruft ##

# https://github.com/mpasternacki/capistrano-documentation-support-files/raw/master/default-execution-path/Capistrano%20Execution%20Path.jpg
# https://github.com/namics/capistrano-php

namespace :deploy do

  task :finalize_update, :except =&gt; { :no_release =&gt; true } do
    transaction do
      run "chmod -R g+w #{releases_path}/#{release_name}"
    end
  end

  task :migrate do
    # do nothing
  end

  task :restart, :except =&gt; { :no_release =&gt; true } do
    run "sudo service nginx reload"
  end
end
</pre>
<h2 id="capistrano-multi-stage">Multi-stage deployments</h2>
<p>I needed to be able to deploy to different environments &#8211; dev, staging and production. The obvious starting point was Googling, which led to the <a target="_blank" href="https://github.com/capistrano/capistrano/wiki/2.x-Multistage-Extension">Capistrano multistage extension</a>. I worked through this for some time, however the requirement for an extra dependency <em>seemed</em> more complicated than necessary. The footnote on the multistage page offered an alternative &#8211; <a target="_blank" href="https://github.com/capistrano/capistrano/wiki/2.x-Multiple-Stages-Without-Multistage-Extension">multiple stages without the multistage extension</a>.</p>
<p>With this pattern, all we have to do is define extra <em>tasks</em> for each of our environments. Within these tasks we define the key information about the environment, namely the <a target="_blank" href="https://github.com/capistrano/capistrano/wiki/Roles">roles</a> that we want to deploy to (which servers we have).</p>
<pre class="code">## multi-stage deploy process ##

task :dev do
  role :web, "dev.myproject.example.com", :primary =&gt; true
end

task :staging do
  role :web, "staging.myproject.example.com", :primary =&gt; true
end

task :production do
  role :web, "production.myproject.example.com", :primary =&gt; true
end</pre>
<p>Now when we deploy we have to include an environment name in the command. I don&#8217;t bother defining a default, so leaving it out will throw an error (you could define a default if you wanted to).</p>
<pre class="code">cap staging deploy</pre>
<h2 id="capistrano-tag-selection">Tag selection</h2>
<p>The next feature I wanted from my killer deploy system was the ability to release <em>specific versions</em>. My plan was</p>
<ul>
<li><a href="http://www.davegardner.me.uk/blog/2009/11/09/continuous-integration-for-php-using-hudson-and-phing/">Jenkins</a> would automatically release <strong>master</strong> on every push</li>
<li>a separate Jenkins project would automatically tag and release production-ready “builds” to a staging environment (anything pushed to <strong>release</strong> branch)</li>
<li>releasing to production would always involve manually tagging with a friendly version number</li>
</ul>
<p><a target="_blank" href="http://nathanhoad.net/deploy-from-a-git-tag-with-capistrano">Nathan Hoad had some good advice</a> on releasing a specific tag via Capistrano &#8211; including a snippet that makes Capistrano <em>ask you</em> what tag to release, defaulting to the most recent. One change I made was the addition of the <strong>unless exists?(:branch)</strong> condition, which means we can setup dev and staging releases to go unsupervised.</p>
<pre class="code">## tag selection ##

# we will ask which tag to deploy; default = latest
# http://nathanhoad.net/deploy-from-a-git-tag-with-capistrano
set :branch do
  default_tag = `git describe --abbrev=0 --tags`.split("\n").last

  tag = Capistrano::CLI.ui.ask "Tag to deploy (make sure to push the tag first): [#{default_tag}] "
  tag = default_tag if tag.empty?
  tag
end unless exists?(:branch)</pre>
<p>For staging, I use this handy bit of bash foo, courtesy of <a target="_blank" href="http://twitter.com/jameslnicholson">@jameslnicholson</a> (split to aid readability):</p>
<pre class="code">set :branch, `git tag | \
    xargs -I@ git log --format=format:"%ci %h @%n" -1 @ | \
    sort | \
    auk '{print  $5}' | \
    egrep '^b[0-9]+$' | \
    tail -n 1`</pre>
<h2 id="capistrano-putting-it-all-together">Putting it all together</h2>
<p>Here&#8217;s a fictitious deployment script for the “We Have Your Kidneys” ad network. There are some extra nuggets in here that are worth highlighting.</p>
<p>Run a task once deployment has finished:</p>
<pre class="code">after "deploy", :except =&gt; { :no_release =&gt; true } do</pre>
<p>Run build script, including tests. This will abort the deployment if they do not pass:</p>
<pre class="code"># run our build script
run "echo '#{app_environment}' &gt; #{releases_path}/#{release_name}/config/environment.txt"
run "cd #{releases_path}/#{release_name} &amp;&amp; phing build"</pre>
<h3>The deployment script in full</h3>
<pre class="code"># Foo Bar deployment script (Capistrano)

## basic setup stuff ##

# http://help.github.com/deploy-with-capistrano/
set :application, "Foo Bar PHP Service"
set :repository, "git@github.com:davegardnerisme/we-have-your-kidneys.git"
set :scm, "git"
default_run_options[:pty] = true
set :deploy_to, "/var/www/we-have-your-kidneys"

# use our keys, make sure we grab submodules, try to keep a remote cache
ssh_options[:forward_agent] = true
set :git_enable_submodules, 1
set :deploy_via, :remote_cache
set :use_sudo, false

## multi-stage deploy process ###

# simple version @todo make db settings environment specific
# https://github.com/capistrano/capistrano/wiki/2.x-Multiple-Stages-Without-Multistage-Extension

task :dev do
  role :web, "dev.davegardner.me.uk", :primary =&gt; true
  set :app_environment, "dev"
  # this is so we automatically deploy current master, without tagging
  set :branch, "master"
end

task :staging do
  role :web, "staging.davegardner.me.uk", :primary =&gt; true
  set :app_environment, "staging"
  # this is so we automatically deploy the latest numbered tag
  # (with staging releases we use incrementing build number tags)
  set :branch, `git tag | xargs -I@ git log --format=format:"%ci %h @%n" -1 @ | sort | awk '{print  $5}' | egrep '^b[0-9]+$' | tail -n 1`
end

task :production do
  role :web, "prod01.davegardner.me.uk", :primary =&gt; true
  role :web, "prod02.davegardner.me.uk"
  set :app_environment, "production"
end

## tag selection ##

# we will ask which tag to deploy; default = latest
# http://nathanhoad.net/deploy-from-a-git-tag-with-capistrano
set :branch do
  default_tag = `git describe --abbrev=0 --tags`.split("\n").last

  tag = Capistrano::CLI.ui.ask "Tag to deploy (make sure to push the tag first): [#{default_tag}] "
  tag = default_tag if tag.empty?
  tag
end unless exists?(:branch)

## php cruft ##

# https://github.com/mpasternacki/capistrano-documentation-support-files/raw/master/default-execution-path/Capistrano%20Execution%20Path.jpg
# https://github.com/namics/capistrano-php

namespace :deploy do

  task :finalize_update, :except =&gt; { :no_release =&gt; true } do
    transaction do
      run "chmod -R g+w #{releases_path}/#{release_name}"
      # run our build script
      run "echo '#{app_environment}' &gt; #{releases_path}/#{release_name}/config/environment.txt"
      run "cd #{releases_path}/#{release_name} &amp;&amp; phing build"
    end
  end

  task :migrate do
    # do nothing
  end

  task :restart, :except =&gt; { :no_release =&gt; true } do
    # reload nginx config
    run "sudo service nginx reload"
  end

  after "deploy", :except =&gt; { :no_release =&gt; true } do
    run "cd #{releases_path}/#{release_name} &amp;&amp; phing spawn-workers &gt; /dev/null 2&gt;&amp;1 &amp;", :pty =&gt; false
  end
end</pre>
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2012/02/13/php-deployment-with-capistrano/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>A mapper pattern for PHP</title>
		<link>http://www.davegardner.me.uk/blog/2011/07/27/a-mapper-pattern-for-php/</link>
		<comments>http://www.davegardner.me.uk/blog/2011/07/27/a-mapper-pattern-for-php/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 15:14:29 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[dao]]></category>
		<category><![CDATA[data mapper]]></category>
		<category><![CDATA[mapper]]></category>
		<category><![CDATA[pattern]]></category>
		<category><![CDATA[serialisation]]></category>
		<category><![CDATA[web service]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=209</guid>
		<description><![CDATA[Introducing a pattern for flexible mapping between domain objects and representations (and vice-versa) using PHP.]]></description>
			<content:encoded><![CDATA[<p>The “mapper” pattern allows us to either:</p>
<ul>
<li><a href="#object_to_representation_mapper">create a representation of an object graph</a></li>
<li><a href="#representation_to_object_mapper">create an object graph from a representation</a></li>
</ul>
<p>This sounds very much like straight-forward <a href="http://en.wikipedia.org/wiki/Serialization" target="_blank">serialisation</a>, but there are some key differences.</p>
<ol>
<li><strong>It is not necessarily two-way</strong><br />
It is possible to map from an object graph to a representation that does not contain all the information to reconstruct an object graph. An example might be a case where we map a user&#8217;s &#8220;screen name&#8221; out of the user object; a useful piece of information, but not enough to construct a fully-formed user object on its own.</li>
<li><strong>The representation is flexible</strong><br />
It is possible to write mappers that map to and from various different representation formats (eg: JSON, XML, <a href="http://wiki.apache.org/cassandra/API#batch_mutate" target="_blank">Cassandra Mutation Map</a>). PHP&#8217;s in-built <a href="http://www.php.net/manual/en/function.serialize.php" target="_blank">serialisation</a> has a fixed end-result, determined by the PHP engine itself.</li>
<li><strong>There are a number of added-extras</strong><br />
These include in-built caching, the ability to override values and provide defaults. More on this later.</li>
</ol>
<h3>How they fit into the overall architecture</h3>
<p>I&#8217;m a huge fan of <a href="http://en.wikipedia.org/wiki/Domain-driven_design" target="_blank">Domain Driven Design</a>. When implementing a new set of functionality, I usually start with <a href="http://en.wikipedia.org/wiki/Class-responsibility-collaboration_card" target="_blank">CRC cards</a>, formulate a design for the domain objects and then start on prototyping in conjunction with unit testing. Aside &#8211; I generally don&#8217;t practice Test Driven Development, although I will hopefully try it out at some point.</p>
<p>The mappers come into play when creating representations of domain objects or creating an object graph from representations.</p>
<div id="attachment_213" class="wp-caption aligncenter" style="width: 610px"><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2011/07/mappers.jpg"><img class="size-full wp-image-213 " src="http://www.davegardner.me.uk/blog/wp-content/uploads/2011/07/mappers.jpg" alt="How mappers can be used in overall architecture" width="600" height="98" /></a><p class="wp-caption-text">How mappers can be used in overall architecture</p></div>
<h4>Web Service</h4>
<p><a href="http://en.wikipedia.org/wiki/Representational_State_Transfer" target="_blank">RESTful Web Service</a> <a href="http://en.wikipedia.org/wiki/GET_%28HTTP%29#Request_methods" target="_blank">GET</a> requests can be created by mapping domain object graphs to representations (JSON, XML, HTML). The schema (more on this later) allows intricate control over exactly which bits of the domain object graph are mapped, allowing for a variety of different representations of the same domain objects. I&#8217;m ignoring <a href="http://en.wikipedia.org/wiki/HATEOAS" target="_blank">HATEOS</a> for the purposes of this post, this can easily be added.</p>
<p>RESTful Web Service <a href="http://en.wikipedia.org/wiki/POST_%28HTTP%29" target="_blank">POST</a> and <a href="http://en.wikipedia.org/wiki/PUT_%28HTTP%29#Request_methods" target="_blank">PUT</a> requests can be handled by mapping the POSTed or PUT representations into a domain object graph and then saving these via the persistence layer.</p>
<h4>Persistence</h4>
<p>Retrieving domain objects from a persistence service (loading) can be facilitated by the representation to object mapper. One example of a specific implementation is a <a href="http://wiki.apache.org/cassandra/API#batch_mutate" target="_blank">Cassandra Mutation Map</a> (a representation) to object mapper.</p>
<p>Persisting domain objects (saving) can be facilitated by the object to representation mapper, for example mapping to a Cassandra Mutation Map or SQL.</p>
<h2>Building blocks of the domain layer</h2>
<p>There are many ways that this mapper pattern could be implemented. The implementation I have created relies on a number of consistent design principles within the domain layer. The important building blocks are outlined below.</p>
<h3>Keyed objects, value objects, collection objects</h3>
<p>We can construct our domain object based around three core object types. Any concrete domain objects therefore share functionality of one of these types and implement a common interface.</p>
<ul>
<li><strong>Keyed objects</strong><br />
Domain objects that have a uniquely identifiable key &#8211; globally unique within the application. These objects are stored in an Identity Map to make sure only one instance for each unique key value is ever created.</li>
<li><strong>Value objects</strong><br />
Domain objects that do not have a uniquely identifiable key. These objects cannot be stored in an Identity Map, nor would it make sense to do so.</li>
<li><strong>Collections</strong><br />
Domain objects that represent a list of other objects. These, at their most basic level, implement the Iterator interface. Each different type of domain object will have its own accompanying collection object.</li>
</ul>
<h3>Virtual-proxy pattern for lazy-loading</h3>
<p>All keyed domain objects can be replaced with a <a href="http://en.wikipedia.org/wiki/Lazy_loading#Virtual_proxy" target="_blank">virtual-proxy</a>. This behaves like the original object (implements the same interface) but only loads itself from the database at the last minute when needed.</p>
<h3>Getters</h3>
<p>All objects have a consistently named “getter” defined. I used to think this was a <a href="http://www.javaworld.com/javaworld/jw-09-2003/jw-0905-toolbox.html" target="_blank">bad idea</a> but I have since mellowed in my opinion (setters are still evil).</p>
<h2 id="object_to_representation_mapper">Object to representation mapper</h2>
<p>The aim here is to turn an object graph into a representation. An example:</p>
<pre class="code">
class user {
    public function getUsername();
    public function getName();
    public function getRegisteredTimestamp();
}
</pre>
<p>We then map this to a JSON representation using the schema:</p>
<pre class="code">$schema = array(
    'username',
    'name',
    'registeredTimestamp'
    );</pre>
<p>Resulting in:</p>
<pre class="code">{"username":"davegardnerisme","name":"Dave Gardner","registeredTimestamp":1284850800}</pre>
<h3>Schemas</h3>
<p>Object graphs nearly always have great complexity, often involving circular relationships. When mapping to representations we often don&#8217;t want all this complexity, nor would it be feasible to include it all! With a rich domain layer built up from many contained lazy-loading objects, if you continue to dig down into the object graph you could end up loading every single object that exists within your application!</p>
<p>This is why we use schemas when mapping from objects to representations &#8211; we need to choose <strong>what</strong> to actually map. What we are really doing it identifying how far to dip into the object graph when formulating the representation.</p>
<h3>How it works</h3>
<p>The implementation of the object-to-representation concept has a number of key elements:</p>
<ol>
<li><strong>Object graph walker</strong><br />
Aware of how to walk through the object graph, according to the given schema, drilling into any contained objects and collections.</li>
<li><strong>Object property extractor</strong><br />
Able to extract a property from an object according to a schema.</li>
<li><strong>Property to value convertor</strong><br />
Able to turn a retrieved property into a scalar value (string/integer/float).</li>
</ol>
<h3>Pseudo code</h3>
<p>We run this code passing in the object to map from plus the schema. The code is structured to be run recursively, building the output array (passed by reference) as it goes.</p>
<pre class="code">foreach (entry in schema)
{
    if (schema entry indicates we should map to a list)
    {
        assert (we have a list)
        foreach (item in list)
        {
            call this recursively with this item and the sub-selection of schema
        }
    }
    else
    {
        extract property of object according to the schema
        convert this property to a scalar value
        add this property to the mapped-to data
    }
}</pre>
<h3>Caching</h3>
<p>One interesting thing about the object to representation cache is that you can add in a caching layer which avoids, in many cases, the need to actually carry out the mapping. This is particularly effective when complex object graphs built from immutable objects. The reason is that we don&#8217;t actually need to load the objects themselves from the database; we can use the virtual-proxy key to give us a cache key and then simply load the representation directly. With PHP it&#8217;s usually a good idea to use APC to cache representations (over Memcached) to avoid the 1MB limit. This makes it more effective when mapping/caching large object graphs to large representations.</p>
<p>A caching system can be added into the mapping code:</p>
<pre class="code">if (object to map is a keyed object)
{
    cache key = hash on (schema + object key)
    if (!exists in cache)
    {
        map as normal
    }
    else
    {
        return the cached representation
    }
}</pre>
<h3>Object verification</h3>
<p>The domain layer is built upon the principle of lazy-loading, making use of the virtual-proxy pattern. These virtual-proxies will initialise a concrete object <strong>when they need to</strong>. Remember that a virtual-proxy has a property that indicates some kind of unique identifier for the object, and it knows how to load itself. Therefore when mapping, if we need to turn a virtual-proxy object into a single value, we don&#8217;t need to load it, since we already have the unique identifier. This is a nice optimisation. However, this is not always desirable.</p>
<p>Sometimes you want to guarantee that objects exist, by forcing any objects touched via the mapper to be loaded from the database. This is where the verification feature comes in.</p>
<h3>Overrides</h3>
<p>We can tweak the final representation by adding overrides. These are a way of saying &#8220;please ignore any value that could be extracted from the object and use this object/callback instead&#8221;. One interesting way I have used these is to attach URL properties to domain objects. URLs are a property that does not usually belong in the problem domain, but rather are concerned with a specific representation scheme &#8211; HTTP. Therefore the overrides can be added within the web service layer to allow us to map a URL for a domain object within a specific context. One interesting point to note, required by the URL example, is that the override doesn&#8217;t actually have to override anything. The original object does not necessarily have to have this property in the first place.</p>
<h3>Example</h3>
<p>All these examples use the domain objects from my <a href="http://www.davegardner.me.uk/experience/glastofinder/">GlastoFinder</a> project. You can <a href="#glastofinder_object_reference">check out the interfaces for these</a> in the last section of this post.</p>
<h4>Mapping the list of places to JSON representation</h4>
<pre class="code">$schema = array(
    service_mapper::LOOP_ITEMS =&gt; array(
        'key',
        'title',
        'category' =&gt; array(
            'key',
            'title'
            ),
        'location' =&gt; array(
            'latitude',
            'longitude'
            ),
        'icon',
        'hashTag',
        'details'
        )
    );
$mapper = $this-&gt;diContainer-&gt;getInstance(
        'service_mapper_objectToArray',
        $schema
        );
$json = $mapper-&gt;map($list);</pre>
<p>Sample output:</p>
<pre class="code">[
    {
        "key": "0f45b80d-96d5-546d-bade-c2e583489783",
        "title": "Poetry &amp; Words",
        "category": {
            "key": "other_venues",
            "title": "Other Venues"
        },
        "location": {
            "latitude": 51.149364968572,
            "longitude": -2.5799948897032
        },
        "icon": "/i/otherstage-sm.png",
        "hashTag": "poetryandwords",
        "details": null
    },
    {
        "key": "10b11ab0-e164-5ac7-8ea4-ea5670cdf54e",
        "title": "Pedestrian Gate E",
        "category": {
            "key": "gates",
            "title": "Gates"
        },
        "location": {
            "latitude": 51.147641084707,
            "longitude": -2.6001275707455
        },
        "icon": "/i/gate-sm.png",
        "hashTag": "gatee",
        "details": null
    }
]</pre>
<h2 id="representation_to_object_mapper">Representation to object mapper</h2>
<p>This is the complete opposite of the object to representation mapper. We take some representation and then turn this into an object graph; potentially constructed of many related objects. An example:</p>
<p>Representation:</p>
<pre class="code">{"username":"davegardnerisme","name":"Dave Gardner","registeredTimestamp":1284850800}</pre>
<p>Will be able to yield us a user object with the following properties:</p>
<pre class="code">class user {
/**
 * Constructor
 *
 * @param string $username The user's username - an alphanumeric (a-zA-Z0-9) string, unique
 * @param string $name The user's full name, forename plus surname, or however they want to name themselves
 * @param integer $registeredTimestamp The date this user was registered
 */
public function __construct(
    $username,
    $name,
    $registeredTimestamp
    )
}</pre>
<p>The mapper, when asked to construct a user object, will examine the values needed by the object (by looking at its constructor) and then extract these properties from the representation. Unlike the object to representation mapper, no schema is needed. The schema is inherent in the object definitions themselves.</p>
<h3>How it works</h3>
<p>The implementation of the representation-to-object concept has a number of key elements:</p>
<ol>
<li><strong>Code analysis tool</strong><br />
Ideally a static analysis tool to avoid the cost of reflection. This would know what parameters are needed to construct each domain object.</li>
<li><strong>Recursive object-building code</strong><br />
Able to determine what type of thing to build (depends on what value available in the representation) and then build it.</li>
</ol>
<h3>Pseudo code</h3>
<p>We run this code passing in the object to map from plus the schema. The code is structured to be run recursively, building the output array (passed by reference) as it goes.</p>
<pre class="code">decide which value to use by looking at representation, defaults and overrides
if (value to use is a scalar [string, int, float])
{
    build placeholder domain object
}
else
{
    build full domain object
}

if (thing to build is list)
{
    foreach (item within representation)
    {
        call function recursively to build item, then add to list
    }
}

verify all built objects, if required</pre>
<h3>Overrides and defaults</h3>
<p>When we map from a representation to an object, it&#8217;s often useful to be able to tweak the process, supplying default values where necessary and overrides in certain situations. To understand a use-case, it&#8217;s important to understand the principle of “fully formed objects”. This means that whenever we create an object, we always supply every piece of information to the object constructor, meaning that if ever we have an object instance, we know that all the data is present and correct. So for example a user object may have a createdTimestamp; this should be supplied, as a valid value, whenever we construct the object. To adhere to this, we could not pass in a NULL value and leave it to the object to provide a default value (eg: now). Instead we should use a Factory for this, or we could use a mapper default!</p>
<p>If we had a web service end point that created a new user, we may require a representation (eg: JSON) to be PUT. However what about the createdTimestamp? Should this be in the PUT representation? My thinking is that no, it shouldn&#8217;t. Instead we use the mapper override feature to <strong>force</strong> the createdTimestamp to be exactly the time that the user was created, according to the server processing the request.</p>
<p>The following illustrates defining an override callback to force a createdTimestamp to be defined at time of mapping. This makes use of <a href="http://php.net/manual/en/functions.anonymous.php" target="_blank">PHP 5.3 anonymous functions</a>.</p>
<pre class="code">$mapper = $diContainer-&gt;getInstance('service_mapper_jsonToObject');
$mapper-&gt;addOverride(
    array('createdTimestamp'),
    function () { return time(); }
    );</pre>
<h3>Rules for building domain objects</h3>
<h4>Placeholders (virtual-proxy) vs full domain objects</h4>
<p>The mapping algorithm ultimately boils down to a situation where we need to build some object, based on some value. This value could be a scalar (string, integer, float) or an array. We need some rules to determine how we go about building a domain object based on the situation we find, specifically what type of value we have.</p>
<ul>
<li><strong>Scalar &#8211; string, integer or float</strong><br />
When we are asked to build a domain object and we find a scalar in the representation, we make the assumption that this value refers to some unique identifier for the object and therefore build a placeholder object (virtual-proxy) instead.</li>
<li><strong>Associative array</strong><br />
When we are asked to build a domain object and we are presented with an array of values, we make the assumption that all of the necessary constructor properties are present in the representation. We then dig through these values and match them up with constructor arguments, before finally constructing the fully-formed domain object.</li>
</ul>
<h4>Collections &#8211; recursion</h4>
<p>Whenever we are building a collection object, we look for a numerically-indexed array of items within the representation. We then call the “turn a value into a domain object” method recursively to yield us objects to push onto our list. The assumptions here are that collections are always represented as an Iterable object within the domain layer.</p>
<h3>Verification</h3>
<p>When we carry out the mapping, we may create any number of placeholder (virtual-proxy) objects in place of real domain objects. These will then be lazy-loaded on-demand. This is all fine and good, but not if you want to ensure that all the objects are valid. Luckily it is trivial for the mapper to keep track of any placeholders it mints during its mapping job. With the verification flag set, this list of of placeholders can then be force-loaded to ensure that they actually exist. Whether or not this is desirable depends on the situation; when accepting PUT or POSTed representations via a web service the safety net is useful, when mapping from a database representation we often don&#8217;t want the overhead.</p>
<h2 id="glastofinder_object_reference">GlastoFinder domain object reference</h2>
<h4>Location interface</h4>
<pre class="code">interface domain_location_interface
{
    /**
     * Get latitude
     *
     * @return float
     */
    public function getLatitude();

    /**
     * Get longitude
     *
     * @return float
     */
    public function getLongitude();

    /**
     * Get distance to another location
     *
     * Uses "haversine" formula
     * @see http://www.movable-type.co.uk/scripts/latlong.html
     *
     * @param mz_domain_location $otherLocation The other location
     *
     * @return float The distance measured in kilometres
     */
    public function getDistanceTo(mz_domain_location $otherLocation);
}</pre>
<h4>Place interface</h4>
<pre class="code">interface domain_place_interface
        extends     domain_keyed_interface,
                    domain_hasLocation_interface
{
    /**
     * Get title
     *
     * @return string
     */
    public function getTitle();

    /**
     * Get category
     *
     * @return domain_place_category_interface
     */
    public function getCategory();

    /**
     * Get icon
     *
     * @return string
     */
    public function getIcon();

    /**
     * Get hash tag
     *
     * @return domain_hashTag_interface
     */
    public function getHashTag();

    /**
     * Get details
     *
     * @return string
     */
    public function getDetails();
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2011/07/27/a-mapper-pattern-for-php/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>GlastoFinder &#8211; writing a mobile application</title>
		<link>http://www.davegardner.me.uk/blog/2011/07/25/glastofinder-writing-a-mobile-application/</link>
		<comments>http://www.davegardner.me.uk/blog/2011/07/25/glastofinder-writing-a-mobile-application/#comments</comments>
		<pubDate>Mon, 25 Jul 2011 16:56:18 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[geolocation]]></category>
		<category><![CDATA[glastofinder]]></category>
		<category><![CDATA[glastonbury]]></category>
		<category><![CDATA[html5]]></category>
		<category><![CDATA[jquery]]></category>
		<category><![CDATA[mobile]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=189</guid>
		<description><![CDATA[Have you ever been to Glastonbury? It's massive. Seriously huge. This is a fantastic feature - there's loads to do and you can wonder aimlessly for hours and hours. However! If you're trying to find your friends it's a problem. Figuring out where everyone is can be tricky. The solution, to a web technologist like myself, was obvious - build a website! As they say, when all you have is a hammer… And so GlastoFinder was born.]]></description>
			<content:encoded><![CDATA[<p>Have you ever been to Glastonbury? It&#8217;s massive. Seriously huge. This is a fantastic feature &#8211; there&#8217;s loads to do and you can wander aimlessly for hours and hours. However! If you&#8217;re trying to find your friends it&#8217;s a problem. Figuring out where everyone is can be tricky.</p>
<p>The solution, to a web technologist like myself, was obvious &#8211; build a website! As they say, when all you have is a hammer… And so <a href="http://www.davegardner.me.uk/experience/glastofinder/">GlastoFinder</a> was born.</p>
<p style="text-align: center;"><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2011/07/P1110132.JPG"><img class="size-large wp-image-190   aligncenter" title="Looking down on Glastonbury" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2011/07/P1110132-1024x439.jpg" alt="Looking down on Glastonbury" width="590" /></a></p>
<h2>Interesting features of the build</h2>
<p>This project was completed quickly. It leaned on a ton of features and patterns I have been developing over a number of years, plus some new tricks (EC2)!</p>
<h3>EC2</h3>
<p>This was my first personal project that made use of <a href="http://aws.amazon.com/ec2/" target="_blank">Amazon&#8217;s Elastic Compute Cloud (EC2)</a>. The process felt good. I created an instance and then did (roughly) this:</p>
<pre class="code">svn co http://svn.davegardner.me.uk/path/to/repo glastofinder_com
cd glastofinder_com
./scripts/install-build-dependencies
phing build</pre>
<p>This feels good. Everything is scripted. Once I finished with the project, I simply shut down the instance. This makes use of the <a href="http://www.phing.info/trac/" target="_blank">Phing</a> build tool, <a href="http://www.davegardner.me.uk/blog/2009/11/09/continuous-integration-for-php-using-hudson-and-phing/">which is mentioned in one of my previous posts</a>.</p>
<h3>Building on top of Twitter</h3>
<p>This is the first time I&#8217;ve built an application on top of Twitter. What I mean by that is that the application relies on Twitter to operate and to supply some of its features (Tweet-to-check-in). My experience is broadly positive. Leveraging an existing network removes the need to build boring sign up, sign in and “follow” functionality. My only grumble is that I think the Twitter API could be less restrictive.</p>
<h3>Using jQuery mobile for the frontend</h3>
<p>This was picked at the last minute to get a vaguely professional frontend in a short amount of time. Whether it achieved this lofty goal is up for debate.</p>
<h3>Solid architecture principles</h3>
<p>The backend (PHP) makes use of a number of nice patterns that are worth a mention.</p>
<ul>
<li>Layered architecture &#8211; web service, domain, DAO, service</li>
<li><a href="http://www.davegardner.me.uk/blog/2009/11/23/php-dependency-strategies-dependency-injection-and-service-locator/">Dependency Injection container</a></li>
<li><a href="http://www.davegardner.me.uk/blog/tag/lazy-load/">Lazy-loading (virtual proxy) pattern</a> for domain objects</li>
<li>Strong OO principles including using <a href="http://www.davegardner.me.uk/blog/2011/03/04/mocking-iterator-with-phpunit/">iterator</a> objects for collections</li>
<li>Mapping pattern (blog post to follow on this)</li>
</ul>
<p>In particular, the interplay between a <strong>strong and pure domain layer</strong>, the use of <strong>Data Access Objects (DAOs)</strong> for persistence and the <strong>mapper pattern</strong> to generate Web Service representations worked very well indeed.</p>
<h2>Field testing</h2>
<p>There&#8217;s nothing like a baptism of fire. I discovered at 10pm on the Tuesday before Glastonbury (I had to leave my house at 7am the next morning) that Twitter search did not include results from new users (most of my friends). There was much last-minute hackery to implement a workaround. Testing is always an important activity. Mimicking the conditions that the application should operate under is especially important to ensure things worth smoothly “in the field”.</p>
<p>I would say that overall the app worked “acceptably”. There were a lot of lessons learned from actually being at the festival!</p>
<h3>Things that were a problem</h3>
<ul>
<li><strong>Patchy network</strong><br/>This probably had the biggest impact. The network was not unusable, merely patchy and often slow. This meant that the more advanced features (Google Map) were much less useful than the light-weight features (timeline / check-in via Twitter).</li>
<li><strong>Setup process difficult</strong><br/>We didn&#8217;t invest an enormous amount of energy trying to make this smooth. The idea was that you would choose which of your Twitter friends could track your location. This avoided a blanket approach but was fiddly to setup and the UX was less than obvious.</li>
<li><strong>Twitter search is heavy cruft</strong><br/>It&#8217;s not possible to build an app that relies on Twitter search. You are not guaranteed to get a message appearing in Twitter search and new users will be absent completely for a few days. Therefore you cannot simply sign-up to the app as a new user and start using it straight away. This would be a massive problem. There are other issues with Twitter search <a href="http://code.google.com/p/twitter-api/issues/detail?id=214" target="_blank">like the fact that the from_user_id is not actually a valid Twitter user ID.</a></li>
</ul>
<h3>Things that worked well</h3>
<ul>
<li><strong>Timeline</strong><br/>The timeline view was great. Firstly because you could see quickly both where people were and also who was actively using the app. Secondly because of the amount of information crammed into each entry (where someone was, when, where they were heading off to, what they were up to). Thirdly because it was fast!</li>
<li><strong>Check-in and check-out via Twitter</strong><br/>Despite the failings of Twitter search, this still turned out to be an amazing feature. It was super fast to check-in via Twitter at one of the many hash-tagged venues &#8211; especially at busy times when 3G didn&#8217;t work (Tweet via SMS).</li>
</ul>
<h3>Potential improvements</h3>
<ul>
<li><strong>Crowd-sourced locations</strong><br/>Each Glastonbury things popup that you don&#8217;t know about before. Epic food stalls setup shop and become instant favourites (<a href="http://bookhams.com/cheese-on-toast/" target="_blank">like the cheese on toast stand</a>). Allowing users to add and share places would be a good addition.</li>
<li><strong>Use DataSift!</strong><br/>I know these guys from <a href="http://meetup.com/Cassandra-London" target="_blank">Cassandra London</a> and <a href="http://datasift.net/a/platform/" target="_blank">their product efficiently processes the Twitter “firehose”</a>. Using this instead of the Twitter search API would save lots of effort.</li>
<li><strong>Build a native app</strong><br/>This is something I was against when I started the project. However a native app would have a few key advantages over a website-based implementation. The major advantage would be that the map could be cached within the app itself, making it much more usable at the festival. This would not be without its drawbacks. The current implementation (based around Google Maps) lends itself to scalability &#8211; it would be relatively simple to add lots of other events, simply using Google Maps and crowd-sourced locations database.</li>
</ul>
<h2>The results</h2>
<p>I decided to analyse my movements at the festival based on my check-ins and check-outs. As a starter for 10, I&#8217;ve just pulled out all the check-ins by day. If I get time I&#8217;ll mash this up with a map so you can see how much ground I&#8217;ve covered and/or if the geo-location was accurate!</p>
<h4>Wednesday</h4>
<ul>
<li><strong>It&#8217;s 6:59 and the computer is going off now. Leaving the house in ~ 40 minutes. Any bugs are now classified as &quot;features&quot;.</strong> <span style="color:#555;">159km from the festival at 7:00am</span>
</li>
<li><strong>At Waterloo!</strong> <span style="color:#555;">176km from the festival at 8:30am</span>
</li>
<li><strong>Slow progress. </strong> <span style="color:#555;">171km from the festival at 10:29am</span>
</li>
<li><strong>Services!</strong> <span style="color:#555;">85km from the festival at 11:45am</span>
</li>
<li><strong>Holy moly I&#8217;m getting near. Last 3G signal?</strong> <span style="color:#555;">44km from the festival at 1:07pm</span>
</li>
<li><strong>Frome now. The end is in sight. And it has a huge black cloud over it. </strong> <span style="color:#555;">25km from the festival at 1:41pm</span>
</li>
<li><strong>Epic queues at the gate. FFS!</strong> <span style="color:#555;">Very near Bus Station at 2:22pm</span>
</li>
<li><strong>Oh yeah! Arrived at #busstation #glasto #checkin /cc @glastonick @sdiddy</strong> <span style="color:#555;">At Bus Station at 2:25pm</span>
</li>
<li><strong>I can _see_ gate a. But I can also see a lot of people between me and gate a.</strong> <span style="color:#555;">Very near Bus Station at 3:02pm</span>
</li>
<li><strong>#checkin #glasto #gatea Finally!</strong> <span style="color:#555;">At Pedestrian Gate A at 3:30pm</span>
</li>
<li><strong>On way shopping passing brielfy #glade #checkin #glasto.</strong> <span style="color:#555;">At The Glade at 5:20pm</span>
</li>
<li><strong>Passing #tinyteatent on way for food. #checkin #glasto</strong> <span style="color:#555;">At The Tiny Tea Tent at 8:55pm</span>
</li>
<li><strong>Cider, but not at a bus. Rather Brothers cider. </strong> <span style="color:#555;">Very near Pie Minister at 9:10pm</span>
</li>
<li><strong>Brother&#8217;s. Oh yeah. </strong> <span style="color:#555;">Very near Pie Minister at 9:29pm</span>
</li>
<li><strong>This is all good. Fire time at glasto. And the Internet work. </strong> <span style="color:#555;">170m from HMS Sweet Charity at 10:56pm</span>
</li>
<li><strong>This is the camp site. </strong> <span style="color:#555;">170m from HMS Sweet Charity at 11:14pm</span>
</li>
</ul>
<h4>Thursday</h4>
<ul>
<li><strong>Where am I? Campsite. Late-o-clock with BHL</strong> <span style="color:#555;">Very near Le Grand Bouffe at 12:48am</span>
</li>
<li><strong>Breakfast with BHL and friends. Very much the raw prawn this morning. </strong> <span style="color:#555;">Very near Le Grand Bouffe at 11:08am</span>
</li>
<li><strong>Milling about. Wondering without aim!</strong> <span style="color:#555;">Very near Reggae Delights (TBC) at 12:23pm</span>
</li>
<li><strong>Sunbathing at #pyramid stage. #glasto #checkin</strong> <span style="color:#555;">At Pyramid at 1:40pm</span>
</li>
<li><strong>Still hanging out at #pyramid waiting for @sdiddy to get the hell out of his van! #checkin #glasto</strong> <span style="color:#555;">At Pyramid at 2:20pm</span>
</li>
<li><strong>Watching first live music at the bandstand! Yay!</strong> <span style="color:#555;">170m from HMS Sweet Charity at 4:10pm</span>
</li>
<li><strong>Having tea at the #tinyteatent &#8211; will be here for a bit. #glasto #checkin</strong> <span style="color:#555;">At The Tiny Tea Tent at 5:00pm</span>
</li>
<li><strong>Near the #glade watching the band who opened first #glasto at spirit of 71 stage. #checkin</strong> <span style="color:#555;">At The Glade at 6:05pm</span>
</li>
<li><strong>Still at #parkstage but heading off soon. #glasto #checkin</strong> <span style="color:#555;">At The Park Stage at 10:25pm</span>
</li>
</ul>
<h4>Friday</h4>
<ul>
<li><strong>Wakey wakey rise and shine! At #pyramid with @terrahawkes for first band of the day. #glasto #checkin</strong> <span style="color:#555;">At Pyramid at 11:00am</span>
</li>
<li><strong>At #westholts watching 30 drummers tearing it up. #glasto #checkin</strong> <span style="color:#555;">At West Holts at 12:30pm</span>
</li>
<li><strong>Passing #tinyteatent on way to Avalon. #glasto #checkin</strong> <span style="color:#555;">At The Tiny Tea Tent at 1:40pm</span>
</li>
<li><strong>Found real ale. At a bar. Neat Avalon I think. Hanging here. Is good. I have zero faith in GPS location ability!</strong> <span style="color:#555;">170m from HMS Sweet Charity at 2:09pm</span>
</li>
<li><strong>Listening to the mad hatters (i think) tear it up at #avalon. #glasto #checkin</strong> <span style="color:#555;">At Avalon at 2:45pm</span>
</li>
<li><strong>It&#8217;s official. @terrahawkes is a bad influence. At #avalon still. Good music here. #glasto #checkin</strong> <span style="color:#555;">At Avalon at 3:45pm</span>
</li>
<li><strong>Watching BB King at #pyramid #glasto #checkin</strong> <span style="color:#555;">At Pyramid at 4:55pm</span>
</li>
</ul>
<h4>Saturday</h4>
<ul>
<li><strong>Gentle introduction to the day at #tinyteatent. #glasto #checkin</strong> <span style="color:#555;">At The Tiny Tea Tent at 11:00am</span>
</li>
<li><strong>Managed to watch a band twice. #checkin #glade #glasto</strong> <span style="color:#555;">At The Glade at 12:35pm</span>
</li>
<li><strong>At #dance (west) for fujiya miyagi. #checkin #glasto</strong> <span style="color:#555;">At Dance Village at 1:05pm</span>
</li>
<li><strong>At #Avalon watching someone who is the champion of the world. #checkin #glasto</strong> <span style="color:#555;">At Avalon at 2:25pm</span>
</li>
<li><strong>In #avalon cafe with a coffee and the paper. Hedonistic! #glasto #checkin</strong> <span style="color:#555;">At Avalon at 2:40pm</span>
</li>
<li><strong>At #westholts with @sdiddy going to get a cider. The sun is out again! #glasto #checkin</strong> <span style="color:#555;">At West Holts at 3:15pm</span>
</li>
<li><strong>Still at #westholts. #checkin #glasto</strong> <span style="color:#555;">At West Holts at 4:20pm</span>
</li>
<li><strong>I appear to have not left #westholts yet. I blame @sdiddy. #checkin #glasto</strong> <span style="color:#555;">At West Holts at 5:15pm</span>
</li>
<li><strong>_still_ at #westholts. The sun is hot. I have a ukekele. All is good. /cc @Pikesley (start a band?) #glasto #checkin</strong> <span style="color:#555;">At West Holts at 5:30pm</span>
</li>
<li><strong>At #pyramid to see some band @sdiddy wants to see! #checkin #glasto</strong> <span style="color:#555;">At Pyramid at 6:05pm</span>
</li>
<li><strong>About to peak? #pyramid #glasto #checkin</strong> <span style="color:#555;">At Pyramid at 7:25pm</span>
</li>
<li><strong>Holy moly, it&#8217;s Elbow at #pyramid and I&#8217;ve just sung happy birthday! #checkin #glasto</strong> <span style="color:#555;">At Pyramid at 9:10pm</span>
</li>
<li><strong>Tell a lie. I mean #Avalon. #checkin #glasto</strong> <span style="color:#555;">At Avalon at 10:55pm</span>
</li>
</ul>
<h4>Sunday</h4>
<ul>
<li><strong>At #pyramid, so early the litter-pickers are still going and there&#8217;s no music. #checkin #glasto</strong> <span style="color:#555;">At Pyramid at 10:00am</span>
</li>
<li><strong>Watching fisherman&#8217;s friends. #checkin #glasto #pyramid</strong> <span style="color:#555;">At Pyramid at 11:05am</span>
</li>
<li><strong>Still at #pyramid. Staying for Paul Simon. #glasto #checkin /cc @sdiddy</strong> <span style="color:#555;">At Pyramid at 2:20pm</span>
</li>
<li><strong>Still at #pyramid. Paul Simon on next. #glasto #checkin</strong> <span style="color:#555;">At Pyramid at 3:55pm</span>
</li>
<li><strong>At #Avalon for Show of Hands. #glasto #checkin</strong> <span style="color:#555;">At Avalon at 6:05pm</span>
</li>
<li><strong>At #pyramid waiting for Beyoncé to start. #checkin #glasto</strong> <span style="color:#555;">At Pyramid at 9:55pm</span>
</li>
</ul>
<h4>Monday</h4>
<ul>
<li><strong>Heading out of #gatea. #checkin #glasto</strong> <span style="color:#555;">At Pedestrian Gate A at 7:40am</span>
</li>
<li> <span style="color:#555;">176km from the festival at 12:27pm</span>
</li>
</ul>
<h2>A final word</h2>
<p>It was great fun building a mobile app. I&#8217;m a firm believer that you should try to work on new things all the time to build your knowledge. A big thank you to my colleagues for the endless discussions and in particular <a href="http://twitter.com/oksannnna" target="_blank">@oksannnna</a> for the pin icons and <a href="http://twitter.com/paugay" target="_blank">@paugay</a> for all the frontend work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2011/07/25/glastofinder-writing-a-mobile-application/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cassandra + Hadoop = Brisk</title>
		<link>http://www.davegardner.me.uk/blog/2011/05/17/cassandra-hadoop-brisk/</link>
		<comments>http://www.davegardner.me.uk/blog/2011/05/17/cassandra-hadoop-brisk/#comments</comments>
		<pubDate>Tue, 17 May 2011 15:42:38 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[brisk]]></category>
		<category><![CDATA[datastax]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=182</guid>
		<description><![CDATA[Slides and details from my talk at Cassandra London on DataStax's Brisk - a distribution of Hadoop, Hive and Cassandra that is suitable for workloads of real-time access plus batch analytics.]]></description>
			<content:encoded><![CDATA[<p>The Cassandra London meetup group has recently celebrated its six month anniversary and after a string of fantastic speakers it was left to me to follow up my talk at the <a href="http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/">first ever meetup</a> with another talk.</p>
<p>I decided to start out giving a brief history of my work with Cassandra; starting when I joined VisualDNA, through the hard times struggling with GC issues up until the present &#8211; successfully running a 16 node Cassandra cluster on EC2. Looking back, working with Cassandra has been a very positive experience, but the analytics side of things (carrying out complex analysis of data stored in Cassandra) seemed harder than it could be.</p>
<p>This is where <a href="http://www.datastax.com/products/brisk" target="_blank">DataStax&#8217;s Brisk</a> comes into play.</p>
<blockquote><p>DataStax’ Brisk is an enhanced open-source Apache Hadoop and Hive distribution that utilizes Apache Cassandra for many of its core services.</p></blockquote>
<p>Put simply, Brisk gives you the real-time capabilities of Cassandra combined with an easy interface to Map Reduce via Hive, in an easy to use bundle.</p>
<h3>Case study &#8211; segmenting users</h3>
<p>As a case study I built <a href="https://github.com/davegardnerisme/we-have-your-kidneys" target="_blank">a very simple system for segmenting users into buckets using PHP</a>. The key idea is to have a pixel that can be included on a website to track users (via a Cookie) and put them into various buckets. This demonstrates the key features of Brisk:</p>
<p><strong>Real-time API access</strong></p>
<ul>
<li><a target="_blank" href="http://wehaveyourkidneys.com/show.php">An API to show a user which segments they are in</a></li>
<li><a target="_blank" href="http://wehaveyourkidneys.com/add.php?segment=bloggage&amp;expires=2592000">An API to add a user to a segment</a></li>
</ul>
<p><strong>Batch analytics</strong></p>
<ul>
<li>A Hive query to find out how many users are in each segment</li>
<li>A Hive query to calculate the average and standard deviation of the number of groups that each user is part of</li>
</ul>
<h3>The talk</h3>
<p><a href="http://skillsmatter.com/podcast/nosql/cassandra-may-meetup/js-1775" target="_blank">You can watch a podcast of the talk on the SkillsMatter website.</a></p>
<div style="width:425px" id="__ss_7992824"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/davegardnerisme/cassandra-hadoop-brisk" title="Cassandra + Hadoop = Brisk">Cassandra + Hadoop = Brisk</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/7992824" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
<div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/davegardnerisme">Dave Gardner</a> </div>
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2011/05/17/cassandra-hadoop-brisk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mocking Iterator with PHPUnit</title>
		<link>http://www.davegardner.me.uk/blog/2011/03/04/mocking-iterator-with-phpunit/</link>
		<comments>http://www.davegardner.me.uk/blog/2011/03/04/mocking-iterator-with-phpunit/#comments</comments>
		<pubDate>Fri, 04 Mar 2011 17:52:34 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[iterator]]></category>
		<category><![CDATA[mock]]></category>
		<category><![CDATA[phpunit]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=173</guid>
		<description><![CDATA[When writing unit tests, it is important that you isolate the system under test from any dependencies. Put simply, this means you have to mock any other objects that your system under test interacts with. This can be tricky when a dependency implements the Iterator interface as you have to carefully mock all calls to [...]]]></description>
			<content:encoded><![CDATA[<p>When writing <a href="http://en.wikipedia.org/wiki/Unit_testing" target="_blank">unit tests</a>, it is important that you isolate the system under test from any dependencies. Put simply, this means you have to mock any other objects that your system under test interacts with. This can be tricky when a dependency implements the <a href="http://php.net/manual/en/class.iterator.php" target="_blank">Iterator interface</a> as you have to carefully mock all calls to five methods in the correct order.</p>
<p>This blog post provides an example test showing how to mock an Iterator in PHPUnit as well as a helper class to make this process straightforward. You can <a target="_blank" href="http://pastebin.com/FVxNf6zq">view the source code for the helper on PasteBin</a> or <a href="/iterator.tar.gz">download the entire source code as a gzipped TAR</a>.</p>
<h3>Simple Iterator class</h3>
<p>To demonstrate, consider the following simple iterator class.</p>
<pre class="code">/**
 * Example list class
 *
 * @author Dave Gardner &lt;dave@davegardner.me.uk&gt;
 */
class exampleList implements Iterator
{
    /**
     * Our items
     *
     * @var array
     */
    private $items = array(
        'item1' =&gt; 'This is the first item',
        'item2' =&gt; 'This is the second item',
        'item3' =&gt; 'This is the third item',
        'item4' =&gt; 'This is the fourth item',
        'item5' =&gt; 'This is the fifth item'
    );

    /**
     * Get key of current item as string
     *
     * @return string
     */
    public function key()
    {
        echo "\033[36m" . __METHOD__ . "\033[0m\n";
        return key($this-&gt;items);
    }

    /**
     * Test if current item valid
     *
     * @return boolean
     */
    public function valid()
    {
        echo "\033[36m" . __METHOD__ . "\033[0m\n";
        return current($this-&gt;items) === FALSE
                ? FALSE
                : TRUE;
    }

    /**
     * Fetch current value
     *
     * @return string
     */
    public function current()
    {
        echo "\033[36m" . __METHOD__ . "\033[0m\n";
        return current($this-&gt;items);
    }

    /**
     * Go to next item
     */
    public function next()
    {
        echo "\033[36m" . __METHOD__ . "\033[0m\n";
        next($this-&gt;items);
    }

    /**
     * Rewind to start
     */
    public function rewind()
    {
        echo "\033[36m" . __METHOD__ . "\033[0m\n";
        reset($this-&gt;items);
    }
}</pre>
<p>Ignore the strange bash escape codes (I can't help myself when writing CLI scripts). Instantiated objects of this class will loop through their five internal items (when asked) and echo out the methods being called in cyan! We can see this in action via the following code:</p>
<pre class="code">echo "\n\033[44;37;01mTest 1: foreach key =&gt; value\033[0m\n\n";

$list = new exampleList();
foreach ($list as $key =&gt; $value)
{
    echo "\033[01m$key = $value\033[0m\n";
}

echo "\n\033[44;37;01mTest 2: foreach value\033[0m\n\n";

$list = new exampleList();
foreach ($list as $value)
{
    echo "\033[01m$value\033[0m\n";
}</pre>
<p>When we run this, we get the following.</p>
<p><a href="http://www.davegardner.me.uk/blog/wp-content/uploads/2011/03/runningIterator.jpg"><img class="aligncenter size-full wp-image-175" title="Running the iterator" src="http://www.davegardner.me.uk/blog/wp-content/uploads/2011/03/runningIterator.jpg" alt="Running the iterator" width="422" height="792" /></a></p>
<h3>Mocking an Iterator for testing</h3>
<p>This shows us exactly which methods are called and in what order. We can now use this to write some tests for a <em>mocked</em> iterator. All we have to do with our mocked object is set up PHPUnit expectations; the key point being the use of the <a href="http://www.phpunit.de/manual/3.6/en/test-doubles.html#test-doubles.mock-objects.tables.matchers" target="_blank">at() matcher</a> to specify the exact sequence of calls. The following test applies this to allow us to iterate through three mocked items. You can <a target="_blank" href="http://pastebin.com/uCdwSK4i">view the full source on PasteBin</a>.</p>
<pre class="code">    public function testWhenMockThreeIterationWithNoKey()
    {
        $list = $this-&gt;buildSystemUnderTest();

        $list-&gt;expects($this-&gt;at(0))
             -&gt;method('rewind');

        // iteration 1
        $list-&gt;expects($this-&gt;at(1))
             -&gt;method('valid')
             -&gt;will($this-&gt;returnValue(TRUE));
        $list-&gt;expects($this-&gt;at(2))
             -&gt;method('current')
             -&gt;will($this-&gt;returnValue('This is the first item'));
        $list-&gt;expects($this-&gt;at(3))
             -&gt;method('next');

        // iteration 2
        $list-&gt;expects($this-&gt;at(4))
             -&gt;method('valid')
             -&gt;will($this-&gt;returnValue(TRUE));
        $list-&gt;expects($this-&gt;at(5))
             -&gt;method('current')
             -&gt;will($this-&gt;returnValue('This is the second item'));
        $list-&gt;expects($this-&gt;at(6))
             -&gt;method('next');

        // iteration 2
        $list-&gt;expects($this-&gt;at(7))
             -&gt;method('valid')
             -&gt;will($this-&gt;returnValue(TRUE));
        $list-&gt;expects($this-&gt;at(8))
             -&gt;method('current')
             -&gt;will($this-&gt;returnValue('And the final item'));
        $list-&gt;expects($this-&gt;at(9))
             -&gt;method('next');

        $list-&gt;expects($this-&gt;at(10))
             -&gt;method('valid')
             -&gt;will($this-&gt;returnValue(FALSE));

        $counter = 0;
        $values = array();
        foreach ($list as $value)
        {
            $values[] = $value;
            $counter++;
        }
        $this-&gt;assertEquals(3, $counter);

        $expectedValues = array(
            'This is the first item',
            'This is the second item',
            'And the final item'
        );
        $this-&gt;assertEquals($expectedValues, $values);
    }</pre>
<h3>Making this process simple via a helper</h3>
<p>There a bunch of other tests within the full source code; I&#8217;ve left them out here to spare you a huge code block! This is actually a win. We have mocked an iterator and it works as expected. The only downside is that there&#8217;s a lot of stuff to repeat each time. To avoid this we can simply make a helper method to do the job for us. You can <a target="_blank" href="http://pastebin.com/FVxNf6zq">view the source code for this on PasteBin</a>.</p>
<pre class="code">    /**
     * Mock iterator
     *
     * This attaches all the required expectations in the right order so that
     * our iterator will act like an iterator!
     *
     * @param Iterator $iterator The iterator object; this is what we attach
     *      all the expectations to
     * @param array An array of items that we will mock up, we will use the
     *      keys (if needed) and values of this array to return
     * @param boolean $includeCallsToKey Whether we want to mock up the calls
     *      to "key"; only needed if you are doing foreach ($foo as $k =&gt; $v)
     *      as opposed to foreach ($foo as $v)
     */
    private function mockIterator(
            Iterator $iterator,
            array $items,
            $includeCallsToKey = FALSE
            )
    {
        $iterator-&gt;expects($this-&gt;at(0))
                 -&gt;method('rewind');
        $counter = 1;
        foreach ($items as $k =&gt; $v)
        {
            $iterator-&gt;expects($this-&gt;at($counter++))
                     -&gt;method('valid')
                     -&gt;will($this-&gt;returnValue(TRUE));
            $iterator-&gt;expects($this-&gt;at($counter++))
                     -&gt;method('current')
                     -&gt;will($this-&gt;returnValue($v));
            if ($includeCallsToKey)
            {
                $iterator-&gt;expects($this-&gt;at($counter++))
                         -&gt;method('key')
                         -&gt;will($this-&gt;returnValue($k));
            }
            $iterator-&gt;expects($this-&gt;at($counter++))
                     -&gt;method('next');
        }
        $iterator-&gt;expects($this-&gt;at($counter))
                 -&gt;method('valid')
                 -&gt;will($this-&gt;returnValue(FALSE));
    }</pre>
<p>Now we can repeat our test using the more succinct:</p>
<pre class="code">    public function testWhenMockThreeIterationWithNoKey()
    {
        $list = $this-&gt;buildSystemUnderTest();

        $expectedValues = array(
            'This is the first item',
            'This is the second item',
            'And the final item'
        );
        $this-&gt;mockIterator($list, $expectedValues);

        $counter = 0;
        $values = array();
        foreach ($list as $value)
        {
            $values[] = $value;
            $counter++;
        }
        $this-&gt;assertEquals(3, $counter);

        $this-&gt;assertEquals($expectedValues, $values);
    }</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2011/03/04/mocking-iterator-with-phpunit/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Applying collective intelligence to PHP UK Conference 2011</title>
		<link>http://www.davegardner.me.uk/blog/2011/02/27/applying-collective-intelligence-to-php-uk-conference-2011/</link>
		<comments>http://www.davegardner.me.uk/blog/2011/02/27/applying-collective-intelligence-to-php-uk-conference-2011/#comments</comments>
		<pubDate>Sun, 27 Feb 2011 20:34:56 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[collective intelligence]]></category>
		<category><![CDATA[jaccard]]></category>
		<category><![CDATA[joind.in]]></category>
		<category><![CDATA[pearson correlation]]></category>
		<category><![CDATA[phpuk2011]]></category>
		<category><![CDATA[recommendation]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=163</guid>
		<description><![CDATA[This post covers the use of basic collective intelligence techniques to analyse talk ratings on joind.in from the PHP UK Conference 2011. It explains how to use a similarity algorithm to power a "users like me like" recommendation system.]]></description>
			<content:encoded><![CDATA[<p>I had a cracking time at the <a target="_blank" href="http://www.phpconference.co.uk/">PHP UK Conference</a> this year. It&#8217;s usually pretty good, but this year I thought the talks were slightly better than normal. I think the free beer at the end always helps!</p>
<p>This got me wondering.</p>
<blockquote><p>What talks did I miss out on that I would have liked?&rdquo;</p></blockquote>
<p>As you may be aware, many delegates have been using <a target="_blank" href="http://joind.in/event/view/506">joind.in</a> to provide feedback on the talks. It turns out that <a target="_blank" href="http://joind.in/api">joind.in have an API</a>, and this, in turn, means we can carry out some basic <a target="_blank" href="http://en.wikipedia.org/wiki/Collective_intelligence">collective intelligence</a> techniques to provide &ldquo;recommendations&rdquo; on what other talks would have been of interest.</p>
<p>The term &ldquo;<a target="_blank" href="http://en.wikipedia.org/wiki/Collective_intelligence">collective intelligence</a>&rdquo; refers to intelligence that emerges from the collaboration of a group. In this case, we can leverage the data within joind.in and make &ldquo;intelligent&rdquo; recommendations.</p>
<p>This post looks at building a simple recommendation engine using the data from joind.in. You can <a href="/phpuk2011.php.gz">download the entire source code here (gzipped)</a> or <a target="_blank" href="http://pastebin.com/6pyFrNyA">view via PasteBin here</a> and try it out for yourself.</p>
<h3>The joind.in API</h3>
<p>The API is not entirely simple to understand, and examples are fairly thin on the ground within the documentation. The main thing to figure out is that you have to POST data to the appropriate API end point, where the POST data itself contains the &ldquo;action&rdquo; to carry out.</p>
<p>This PHP function uses CURL to fetch API data via JSON, constructing the correct data to POST.</p>
<pre class="code">
/**
 * Hit the Joind.in API
 *
 * @param string $endPoint API end point, eg: "event" to hit event API
 * @param string $action The desired action, eg: "gettalks"
 * @param array $params Any params to send
 *
 * @return array Decoded JSON data
 */
function joindInApi($endPoint, $action, array $params = array())
{
    $requestData = array(
        'request' => array(
            'action' => array(
                'type' => $action,
                'data' => $params
            )
        )
    );
    $options = array(
        CURLOPT_RETURNTRANSFER => TRUE,     // return web page
        CURLOPT_HEADER         => FALSE,    // don't return headers
        CURLOPT_FOLLOWLOCATION => TRUE,     // follow redirects
        CURLOPT_ENCODING       => '',       // handle all encodings
        CURLOPT_USERAGENT      => 'DAVE!',  // who am i
        CURLOPT_AUTOREFERER    => TRUE,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
        CURLOPT_HTTPHEADER     => array('Content-Type: application/json'),
        CURLOPT_POSTFIELDS     => json_encode($requestData)
    );

    $ch = curl_init('http://joind.in/api/' . $endPoint);
    curl_setopt_array($ch, $options);
    $content = curl_exec($ch);
    $err = curl_errno($ch);
    $errmsg = curl_error($ch);
    $header = curl_getinfo($ch);
    curl_close($ch);

    return json_decode($content, TRUE);
}
</pre>
<h3>Grab talks and ratings</h3>
<p>The first thing we need to do is fetch all the talks for the conference along with any user ratings. We can do this via the <a target="_blank" href="http://joind.in/api#get_evt_talks">Event API &ldquo;gettalks&rdquo; action</a> followed by the <a target="_blank" href="http://joind.in/api#get_talk_comments">Talk API &ldquo;getcomments&rdquo; action.</a></p>
<pre class="code">
// Phase 1: grab ratings via the Join.in API

$userRatings = array();     // [userId][talkId] = rating
$talkTitles = array();      // we'll store these for later

$talks = joindInApi('event', 'gettalks', array('event_id' => 506));
foreach ($talks as $talk)
{
    $talkTitles[$talk['ID']] = $talk['talk_title'];
    echo $talk['ID'] . "\t" . $talk['talk_title'] . "\n";
    $comments = joindInApi('talk', 'getcomments', array('talk_id' => $talk['ID']));
    foreach ($comments as $comment)
    {
        echo ' -> ' . $comment['uname'] . "\t" . $comment['rating'] . "\n";
        $userRatings[$comment['uname']][$talk['ID']] = $comment['rating'];
    }
}
</pre>
<h3>Calculating similar users</h3>
<p>To work out recommendations we&#8217;ll use the classic &ldquo;people like me like&rdquo; method (a type of <a target="_blank" href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filter</a>). This works by calculating a similarity score between a user and every other user. This is easy to implement and works well for small users sets. Companies with a <em>lot</em> of users, for example Amazon, usually use item-based collaborative filtering instead of user-based, due to the difficulty in calculating similarity between every user at this scale.</p>
<p>There are many different algorithms that will score how similar two users are, based on a set of data. Examples include <a target="_blank" href="http://en.wikipedia.org/wiki/Euclidean_distance">Euclidean distance</a>, <a target="_blank" href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard index</a> and <a target="_blank" href="http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient">Pearson correlation</a>.</p>
<p>It is often very difficult to know which distance algorithm will give best results and therefore the best advice is to try them all out! We will use the Pearson correlation in this example.</p>
<p>The following is a PHP implementation of Pearson, borrowing heavily from the excellent beginners book <a target="_blank" href="http://oreilly.com/catalog/9780596529321">Programming Collective Intelligence</a>.</p>
<pre class="code">
/**
 * Calculate pearson distance
 *
 * This calculates the pearson correlation between user1 and user2; a measure
 * of how similar users are.
 *
 * @param array $userRatings Our array of user ratings; [userId][talkId] = rating
 * @param string $user1 The first userId
 * @param string $user2 The second userId
 *
 * @return integer|float A number between -1 and 1, where -1 indicates very
 *      dissimilar, and 1 indicates very similar
 */
function calculatePearson($userRatings, $user1, $user2)
{
    // get list of talks both have rated
    $talks = array_keys(array_intersect_key(
            $userRatings[$user1],
            $userRatings[$user2]
            ));
    $numBothHaveRated = count($talks);
    if ($numBothHaveRated === 0)
    {
        $pearson = 0;
    }
    else
    {
        $sumOfRatingsUser1 = 0;
        $sumOfSquareOfRatingsUser1 = 0;
        $sumOfRatingsUser2 = 0;
        $sumOfSquareOfRatingsUser2 = 0;
        $sumOfProducts = 0;

        foreach ($talks as $talkId)
        {
            $sumOfRatingsUser1 += $userRatings[$user1][$talkId];
            $sumOfSquareOfRatingsUser1 += pow($userRatings[$user1][$talkId], 2);
            $sumOfRatingsUser2 += $userRatings[$user2][$talkId];
            $sumOfSquareOfRatingsUser2 += pow($userRatings[$user2][$talkId], 2);
            $sumOfProducts += $userRatings[$user1][$talkId] * $userRatings[$user2][$talkId];
        }

        // calculate pearson
        $numerator = $sumOfProducts - ($sumOfRatingsUser1 * $sumOfRatingsUser2 / $numBothHaveRated);
        $denominator = sqrt(
                ($sumOfSquareOfRatingsUser1 - pow($sumOfRatingsUser1, 2) / $numBothHaveRated)
              * ($sumOfSquareOfRatingsUser2 - pow($sumOfRatingsUser2, 2) / $numBothHaveRated)
                );
        if ($denominator == 0)
        {
            $pearson = 0;
        }
        else
        {
            $pearson = $numerator / $denominator;
        }
    }

    return $pearson;
}
</pre>
<p>We can now run through all the users we found (who had provided comments!) and work out their similarity with every other user.</p>
<pre class="code">
// Phase 2: Calculate user similarity (via Pearson correlation)

$pearson = array();

$users = array_keys($userRatings);
foreach ($users as $user1)
{
    foreach ($users as $user2)
    {
        if ($user1 !== $user2 &#038;&#038; !isset($pearson[$user1][$user2]))
        {
            $value = calculatePearson(
                    $userRatings,
                    $user1,
                    $user2
                    );
            $pearson[$user1][$user2] = $value;
            $pearson[$user2][$user1] = $value;
            echo $user1 . "\t" . $user2 . "\t" . $value . "\n";
        }
    }
}

echo "\nLike me:\n";

arsort($pearson[WHO_AM_I]);
foreach ($pearson[WHO_AM_I] as $user => $value)
{
    echo $user . "\t" . $value . "\n";
}
</pre>
<p>So who is like me? Turns out it&#8217;s these guys:</p>
<ul>
<li>welworthy = 1</li>
<li>ianb = 1</li>
<li>manarth = 1</li>
<li>m.whitby@gmail.com = 0.99999999999999</li>
<li>rowan_m = 0.5</li>
</ul>
<h3>Providing recommendations</h3>
<p>Now I know the users who are most similar to me, I can see which talks they liked. The following recommendation algorithm does just this, weighting all talks according to how similar I am to them.</p>
<pre class="code">
/**
 * Get recommendations
 *
 * Return recommendations on talks I _should_ have seen (if I could have!)
 *
 * @param array $userRatings Our user ratings; [userId][talkId] = rating
 * @param string $user The user to get recommendations for
 * @param array $similarities The similarities of all users; [user1][user2] = #
 *
 * @return array [talkId] = &lt;how much you should have seen it!&gt;
 */
function getRecommendations(array $userRatings, $user, array $similarities)
{
    $totals = array();
    $similaritySums = array();

    foreach ($userRatings as $compareWithUser => $talksWithRatings)
    {
        // don't compare against self
        if ($user === $compareWithUser)
        {
            continue;
        }

        // how similar?
        $similarity = $similarities[$user][$compareWithUser];
        // ignore users if they aren't similar (&lt;=0)
        if ($similarity &lt;= 0)
        {
            continue;
        }

        foreach ($talksWithRatings as $talkId =&gt; $rating)
        {
            // skip if I saw this talk
            if (isset($userRatings[$user][$talkId]))
            {
                continue;
            }
            if (!isset($totals[$talkId]))
            {
                $totals[$talkId] = 0;
            }
            $totals[$talkId] += $rating * $similarity;
            if (!isset($similaritySums[$talkId]))
            {
                $similaritySums[$talkId] = 0;
            }
            $similaritySums[$talkId] += $similarity;
        } // end foreach talks
    } // end foreach users

    // generate normalised list
    foreach ($totals as $talkId =&gt; &#038;$score)
    {
        $score /= $similaritySums[$talkId];
    }

    arsort($totals);

    return $totals;
}
</pre>
<p>The final stage is to run this through for me!</p>
<pre class="code">
// Phase 3: Get recommendations

echo "\nRecommended talks:\n";

$recommendations = getRecommendations($userRatings, WHO_AM_I, $pearson);
foreach ($recommendations as $talkId =&gt; $recommendation)
{
    echo $talkId . "\t" . $talkTitles[$talkId] . " ($recommendation)\n";
}
</pre>
<p>So my final recommendations are (with a rating in brackets):</p>
<ul>
<li>2514: Beyond Frameworks (5)</li>
<li>2511: 99 Problems, But The Search Ain&#8217;t One (5)</li>
<li>2512: Advanced OO Patterns (5)</li>
<li>2521: Varnish in Action (4)</li>
<li>2520: Running on Amazon EC2 (4)</li>
<li>2513: Agility and Quality (3)</li>
</ul>
<h3>Conclusion</h3>
<p>When I first ran this through on Saturday evening, my recommendations did not include &ldquo;Beyond Frameworks&rdquo; nor &ldquo;Agility and Quality&rdquo;. Now, on Sunday evening, there is more data and these have popped up. I think I prefer my Saturday evening list, but it&#8217;s not too far off.</p>
<p>It would be interesting to experiment with different similarity algorithms to see what impact this has. It would also be cool to use the joind.in API to look at <em>other</em> talks that my similar users have rated positively, outside of this conference. These are left as exercises for the reader!</p>
<p>If you&#8217;re interested in learning more I&#8217;d recommend starting with the O&#8217; Reilly book, <a target="_blank" href="http://oreilly.com/catalog/9780596529321">Programming Collective Intelligence</a>. The examples take a bit of work to fully understand, but it shields you from the Maths.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2011/02/27/applying-collective-intelligence-to-php-uk-conference-2011/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Running Cassandra on EC2</title>
		<link>http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/</link>
		<comments>http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/#comments</comments>
		<pubDate>Wed, 01 Dec 2010 12:05:00 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[netflix]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=153</guid>
		<description><![CDATA[This talk covers the advantages and disadvantages of running Cassandra on EC2 and includes some I/O benchmarks - including some excellent work from Corey Hulen. There is also a basic overview of what actually happens when Cassandra reads and writes (although this is simplified to a single node).]]></description>
			<content:encoded><![CDATA[<p>As the founder of <a href="http://www.meetup.com/Cassandra-London/" target="_blank">Cassandra London</a> it was left to me to provide the first talk; hopefully this won&#8217;t be necessary <em>every</em> month! To kick things off I talked about running Cassandra on Amazon EC2. At <a href="http://www.visualdna.com/" target="_blank">VisualDNA</a> we run a production cluster on EC2; but this hasn&#8217;t been without its difficulties!</p>
<p>This talk covers the advantages and disadvantages of running Cassandra on EC2 and includes some I/O benchmarks &#8211; including some <a href="http://www.coreyhulen.org/?p=326" target="_blank">excellent work from Corey Hulen</a>. There is also a basic overview of what actually happens when Cassandra reads and writes (although this is simplified to a single node).</p>
<p>The main reason that EC2 could be problematic really comes down to I/O performance, and perhaps more importantly the predictability of I/O performance. This aside, there are many reasons why you may want to use EC2. <a href="http://cloudscaling.com/blog/cloud-computing/cloud-innovators-netflix-strategy-reflects-google-philosophy?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%3A+neoTactics+%28Cloudscaling%29">This interview with Adrian Cockcroft</a> looks at why Netflix chose to go down the EC2 route and is a recommended read.</p>
<div style="width:425px" id="__ss_5794808"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/davegardnerisme/running-cassandra-on-amazon-ec2" title="Running Cassandra on Amazon EC2">Running Cassandra on Amazon EC2</a></strong><object id="__sse5794808" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassldn-101116044342-phpapp01&#038;stripped_title=running-cassandra-on-amazon-ec2&#038;userName=davegardnerisme" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse5794808" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassldn-101116044342-phpapp01&#038;stripped_title=running-cassandra-on-amazon-ec2&#038;userName=davegardnerisme" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/davegardnerisme">Dave Gardner</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2010/12/01/running-cassandra-on-ec2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why you should always use PHP interfaces</title>
		<link>http://www.davegardner.me.uk/blog/2010/11/21/why-you-should-always-use-php-interfaces/</link>
		<comments>http://www.davegardner.me.uk/blog/2010/11/21/why-you-should-always-use-php-interfaces/#comments</comments>
		<pubDate>Sun, 21 Nov 2010 16:06:17 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[crc]]></category>
		<category><![CDATA[design patterns]]></category>
		<category><![CDATA[interfaces]]></category>
		<category><![CDATA[lazy load]]></category>
		<category><![CDATA[virtual proxy]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=142</guid>
		<description><![CDATA[This post was sparked by a very simple question from an ex-colleague "Why bother with PHP interfaces?" It gives two reasons - firstly as a way of making you think in the right way when conducting Object Oriented design; secondly because interfaces are essential for flexible code, as demonstrated via the Virtual Proxy pattern.]]></description>
			<content:encoded><![CDATA[<p>This post was sparked by a very simple question from an ex-colleague:</p>
<blockquote><p>Why bother with PHP interfaces?</p></blockquote>
<p>The subtext here was that the classes themselves contain the interface definition (through the methods they define) and hence interfaces are not needed, particularly where there is only one class that implements a given interface. One response to this question would be &ldquo;hang on, <a href="http://lmgtfy.com/?q=why+use+php+interfaces">let me Google that for you</a>&rdquo;. However the question got me thinking about how best to demonstrate the purpose and power of interfaces. Plus there isn&#8217;t that much good stuff out there about the use of PHP interfaces.</p>
<h3>Why interfaces?</h3>
<h4>1. It helps you think about things in the right way</h4>
<p>Object oriented design is hard. It&#8217;s usually harder for programmers, like myself, who started out in a procedural world. Allen Holub sums it up nicely in the primer section of his book <a href="http://www.davegardner.me.uk/reading/holub-on-patterns/">Holub on Patterns</a> (which includes much of <a href="http://www.javaworld.com/javaworld/jw-07-1999/jw-07-toolbox.html?page=2">this article from Javaworld</a>) when he explains how people often think that an &#8220;object is a datastructure of some sort combined with a set of functions&#8221;. As Holub points out &#8211; this is incorrect.</p>
<blockquote><p>First and foremost, an object is a collection of <em>capabilities</em>. An object is defined by what it can do, not by how it does it &#8212; and the data is part of &#8220;how it does it.&#8221; In practical terms, this means that an object is defined by the messages it can receive and send; the methods that handle these messages comprise the object&#8217;s sole interface to the outer world.</p></blockquote>
<p>It&#8217;s a subtle distinction, especially for those of us with a procedural background. Thinking in terms of these messages is hard; in the same way that <a href="http://php.net/manual/en/language.oop5.static.php">static methods in PHP</a> and <a href="http://en.wikipedia.org/wiki/God_object">&#8220;God&#8221; objects</a> seem so much more familiar. Making wide use of interfaces when building an application is an easy way of getting out of these bad habits.</p>
<p>When I&#8217;m trying to design an OO system I usually start with <a href="http://en.wikipedia.org/wiki/Class-responsibility-collaboration_card">Cunningham and Beck&#8217;s CRC cards (Class, Responsibility, Collaboration)</a>. This approach involves writing down on a separate index card (or small piece of paper) the name of each class, what its responsibility is and which other classes it collaborates with. From here, I then define the public interface of each class (the methods that handle the messages it can send and receive).</p>
<p>Thinking in the right way is the biggest advantage of interfaces, because it is the hardest thing to get right.</p>
<h4>2. It makes for more future-proof code</h4>
<p>Having a separate interface for every domain object may at first seem like overkill. Why bother? One reason is the flexibility it gives you going forward. An example of this is the <a href="http://martinfowler.com/eaaCatalog/lazyLoad.html">Virtual Proxy lazy-load pattern</a>. This relies on a placeholder object having exactly the same <strong>interface</strong> as a real domain object. Whenever one of the Virtual Proxy&#8217;s methods is called, it will lazy-load the actual object itself and then pass the message through. Stefan Preibsh recently gave a talk that includes this pattern entitled <a href="http://www.slideshare.net/spriebsch/a-new-approach-to-object-persistence">&#8220;A new approach to object persistence&#8221;</a> which is well worth a read.</p>
<p>The interesting thing here is that by making classes that collaborate with a given class rely only on interface rather than concrete class we can implement a Virtual Proxy pattern <em>at any time</em>, safe in the knowledge none of our application will break!</p>
<pre class="code">// this is good - userInterface is an interface name
public function sendReminder(userInterface $user)
{
    // do something
}
// this is less good! user is a specific class implementation
public function sendReminder(user $user)
{
    // do something
}</pre>
<p>The following simple example shows the Virtual Proxy pattern in action.</p>
<pre class="code">// our interface
interface userInterface
{
    public function isMember();
}

// our concrete user class
class user implements userInterface
{
    private $paidUntil;

    public function __construct($paidUntil)
    {
        $this-&gt;paidUntil = $paidUntil;
    }
    public function isMember()
    {
        return $paidUntil &gt; time();
    }
}

// our proxy
class userProxy implements userInterface
{
    private $dao;
    private $user;
    private $userId;

    public function __construct($dao, $userId)
    {
        $this-&gt;dao = $dao;
        // set user to NULL to indicate we haven't loaded yet
        $this-&gt;user = NULL;
    }

    public function isMember()
    {
        if ($this-&gt;user === NULL)
        {
            $this-&gt;lazyLoad();
        }
        return $this-&gt;user-&gt;isMember();
    }

    private function lazyLoad()
    {
        $this-&gt;user = $this-&gt;dao-&gt;getById($this-&gt;userId);
    }
}</pre>
<p>Now let&#8217;s assume we have some other part of our code that collaborates with the user object to find out if someone is a member.</p>
<pre class="code">class inviteList
{
    public function add(userInterface $user)
    {
        if (!$user-&gt;isMember())
        {
            throw new youMustPayException('You must be a member to get an invite.');
        }
        // do something else
    }
}</pre>
<p>Our <em>inviteList</em> class does not care what type of object it receives (proxy or real), it simply requires something that implements <em>userInterface</em>.</p>
<p>The same argument holds true for the <a href="http://en.wikipedia.org/wiki/Decorator_pattern">Decorator pattern</a>; by programming to interface each collaborator doesn&#8217;t care which concrete implementation (the outer-most decorated layer) it is given, as long as it adheres to the right interface.</p>
<p>In a nut shell: <strong>always programme to interface</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2010/11/21/why-you-should-always-use-php-interfaces/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Cassandra: replication and consistency</title>
		<link>http://www.davegardner.me.uk/blog/2010/10/07/cassandra-replication-and-consistency/</link>
		<comments>http://www.davegardner.me.uk/blog/2010/10/07/cassandra-replication-and-consistency/#comments</comments>
		<pubDate>Thu, 07 Oct 2010 08:45:54 +0000</pubDate>
		<dc:creator>Dave</dc:creator>
				<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[consistency]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.davegardner.me.uk/blog/?p=139</guid>
		<description><![CDATA[Cassandra can be an unforgiving beast if you don't know what you're doing. I have first hand experience of this! My advice: learn everything you can. This is a good introduction to replication and consistency in Cassandra.]]></description>
			<content:encoded><![CDATA[<p>Cassandra can be an unforgiving beast if you don&#8217;t know what you&#8217;re doing. I have first hand experience of this! My advice: learn everything you can. This is a good introduction to <strong>replication</strong> and <strong>consistency</strong> in Cassandra.</p>
<div id="__ss_3903952" style="width: 425px;"><strong style="display:block;margin:12px 0 4px"><a title="Introduction to Cassandra: Replication and Consistency" href="http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency">Introduction to Cassandra: Replication and Consistency</a></strong><object id="__sse3903952" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandra2010-04-27-100429121250-phpapp02&amp;stripped_title=introduction-to-cassandra-replication-and-consistency&amp;userName=benjaminblack" /><param name="name" value="__sse3903952" /><param name="allowfullscreen" value="true" /><embed id="__sse3903952" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandra2010-04-27-100429121250-phpapp02&amp;stripped_title=introduction-to-cassandra-replication-and-consistency&amp;userName=benjaminblack" name="__sse3903952" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/benjaminblack">Benjamin Black</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.davegardner.me.uk/blog/2010/10/07/cassandra-replication-and-consistency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
