posted on Sunday 21st November 2010 by Dave

Why you should always use PHP interfaces

This post was sparked by a very simple question from an ex-colleague:

Why bother with PHP interfaces?

The subtext here was that the classes themselves contain the interface definition (through the methods they define) and hence interfaces are not needed, particularly where there is only one class that implements a given interface. One response to this question would be “hang on, let me Google that for you”. However the question got me thinking about how best to demonstrate the purpose and power of interfaces. Plus there isn’t that much good stuff out there about the use of PHP interfaces.

Why interfaces?

1. It helps you think about things in the right way

Object oriented design is hard. It’s usually harder for programmers, like myself, who started out in a procedural world. Allen Holub sums it up nicely in the primer section of his book Holub on Patterns (which includes much of this article from Javaworld) when he explains how people often think that an “object is a datastructure of some sort combined with a set of functions”. As Holub points out – this is incorrect.

First and foremost, an object is a collection of capabilities. An object is defined by what it can do, not by how it does it — and the data is part of “how it does it.” In practical terms, this means that an object is defined by the messages it can receive and send; the methods that handle these messages comprise the object’s sole interface to the outer world.

It’s a subtle distinction, especially for those of us with a procedural background. Thinking in terms of these messages is hard; in the same way that static methods in PHP and “God” objects seem so much more familiar. Making wide use of interfaces when building an application is an easy way of getting out of these bad habits.

When I’m trying to design an OO system I usually start with Cunningham and Beck’s CRC cards (Class, Responsibility, Collaboration). This approach involves writing down on a separate index card (or small piece of paper) the name of each class, what its responsibility is and which other classes it collaborates with. From here, I then define the public interface of each class (the methods that handle the messages it can send and receive).

Thinking in the right way is the biggest advantage of interfaces, because it is the hardest thing to get right.

2. It makes for more future-proof code

Having a separate interface for every domain object may at first seem like overkill. Why bother? One reason is the flexibility it gives you going forward. An example of this is the Virtual Proxy lazy-load pattern. This relies on a placeholder object having exactly the same interface as a real domain object. Whenever one of the Virtual Proxy’s methods is called, it will lazy-load the actual object itself and then pass the message through. Stefan Preibsh recently gave a talk that includes this pattern entitled “A new approach to object persistence” which is well worth a read.

The interesting thing here is that by making classes that collaborate with a given class rely only on interface rather than concrete class we can implement a Virtual Proxy pattern at any time, safe in the knowledge none of our application will break!

// this is good - userInterface is an interface name
public function sendReminder(userInterface $user)
{
    // do something
}
// this is less good! user is a specific class implementation
public function sendReminder(user $user)
{
    // do something
}

The following simple example shows the Virtual Proxy pattern in action.

// our interface
interface userInterface
{
    public function isMember();
}

// our concrete user class
class user implements userInterface
{
    private $paidUntil;

    public function __construct($paidUntil)
    {
        $this->paidUntil = $paidUntil;
    }
    public function isMember()
    {
        return $paidUntil > time();
    }
}

// our proxy
class userProxy implements userInterface
{
    private $dao;
    private $user;
    private $userId;

    public function __construct($dao, $userId)
    {
        $this->dao = $dao;
        // set user to NULL to indicate we haven't loaded yet
        $this->user = NULL;
    }

    public function isMember()
    {
        if ($this->user === NULL)
        {
            $this->lazyLoad();
        }
        return $this->user->isMember();
    }

    private function lazyLoad()
    {
        $this->user = $this->dao->getById($this->userId);
    }
}

Now let’s assume we have some other part of our code that collaborates with the user object to find out if someone is a member.

class inviteList
{
    public function add(userInterface $user)
    {
        if (!$user->isMember())
        {
            throw new youMustPayException('You must be a member to get an invite.');
        }
        // do something else
    }
}

Our inviteList class does not care what type of object it receives (proxy or real), it simply requires something that implements userInterface.

The same argument holds true for the Decorator pattern; by programming to interface each collaborator doesn’t care which concrete implementation (the outer-most decorated layer) it is given, as long as it adheres to the right interface.

In a nut shell: always programme to interface.

posted on Thursday 7th October 2010 by Dave

Cassandra: replication and consistency

Cassandra can be an unforgiving beast if you don’t know what you’re doing. I have first hand experience of this! My advice: learn everything you can. This is a good introduction to replication and consistency in Cassandra.

posted on Thursday 23rd September 2010 by Dave

Effective load testing with Apache JMeter

Load testing is surely one of the most important activities that many developers ignore. I would include myself in that bracket; it is far too often something that gets bounced out of a busy schedule. However load testing, and its cousin stress testing, are absolutely essential when attempting to create a reliable application.

This blog post concentrates on load testing, which Wikipedia defines as:

Load testing is the process of putting demand on a system or device and measuring its response.”

This is subtly different to stress testing, which aims to test a system “beyond normal operational capacity, often to a breaking point”. Load testing is useful to ensure that your application meets the business requirements, for example “cope with x million page unique users per day with a response time of less than 1 second”.

This post covers:

Things to consider when load testing

It’s very easy to carry out useless load testing. I know this from experience. We were testing a system designed to serve 30 API requests per second, each response containing data on a user from a pool of roughly 100,000,000. When we carried out our load testing we designed a test using a pool of 1,000 unique user IDs. Spot the obvious problem! Testing with only 1,000 unique user IDs meant that we quickly reached a point where various layers of caching would be happily dealing with the job of fetching this limited set of data. This includes MySQL’s key cache plus the operating system file system cache.

To avoid making the same mistakes, make sure you consider the following points when load testing.

  1. Use indicative hardware
    When load testing to ensure that an application reaches minimum performance standards, you will need to test on hardware that mirrors your “live” setup as closely as possible. Sometimes this is easy, for example if you are running a live system consisting of three servers. If however you are running a 30 server cluster including a 100TB database, getting an accurate staging system can be hard work.
  2. Use indicative data
    If you have a production system with 100,000,000 rows of data, it’s a good idea not to carry out load testing on a staging system with only 300 rows. Similarly, if your application stores rows that are roughly 100KB each, the load testing system should be designed to replicate these conditions. The closer you can get to your actual data, the more useful the results.
  3. Think about system state
    Layers of caching can make load testing hard work. This includes application caching such as Memcache plus operating system caching and database caching. If your live environment workload can expect warmed caches then it is valid to leave these things warmed! However running a load test over and over again may get your application into an unnatural state of “preparedness”.

Using Apache JMeter

It may be tempting to write your own load test tools. Why not indeed? A few CURLs, a few processes; simple. My advice would be this: before you do this, try out JMeter. You can download JMeter from this page.

This example job reads in a list of user UUIDs from a text file and then makes an HTTP request for each user.

1. Launch JMeter

sh bin/jmeter.sh &

2. Configure job

  • Right click on Test Plan heading, click Add > Thread Group
  • Right click on Thread Group, click Add > CSV Data Set Config
  • Click on the newly added config element and enter a valid Filename – this should contain one user UUID per line
  • Enter userId under Variable Names (comma-delimited)
  • Click on the Thread Group, configure the Number of Threads (say 5), the Ramp-Up Period (say 180) and then the test Duration (say 600). This tells JMeter to launch up to 5 threads, ramping these up over 2 minutes, with the test running for a total of 10 minutes.

Thread group settings

  • Right click on Thread Group, click Add > Sampler > HTTP Request
  • Click on the newly added sampler element and enter a valid Server Name or IP
  • You should probably add in some Timeouts; depending on your desired performance under load
  • Enter a Path, including the previously defined variable as follows: /1/user/${userId}/data.json

Screen shot 2010-09-23 at 14.41.36

3. Add a listener

This was a stage that took me a while to figure out! You must have at least one listener, otherwise you cannot view any results from the job. You can add listeners to the Thread Group in the same way you added samplers and config elements. The Summary Report is very useful, as is View Results Tree (especially during debugging).

4. Run the job!

It’s usually a good idea to Clear All (from the Run menu) before you fire off the job.

5. Profit

Epic Fail!

Verifying the details of a response

By default JMeter will work with HTTP response codes – which is handy. So 404 errors and 500 errors etc… will all be dealt with. It’s often useful to look at the actual content of the response to decide on success. This can help catch things like PHP Fatal Errors (where you may have a 200 OK response, but a completely blank document). This is where the Response Assertion comes in.

Right click on any sampler and then Add > Assertion > Response Assertion. You can then define patterns to test against; these can use regular expressions. One other point to note is that if you wish to have HTTP 500 or 400 errors ignored (treated as success) then you should tick the Ignore Status tick box.

Extracting data from a response

Hitting URLs with known UUIDs is a handy tool, however it’s also useful to call a URL to add a user, extract the newly added user UUID and then use this in any subsequent requests. Fortunately this is also very easy to achieve with JMeter, using the Regular Expression Extractor.

Right click on any sampler and then Add > Post Processors > Regular Expression Extractor. The following settings extract data from a JSONP response which includes a userId definition; this is then assigned to the JMeter variable userId.

  • Reference Name: userId
  • Regular Expression: var userId = ‘([a-f0-9-]+)’;
  • Template: $1$
  • Match No. (0 for Random): 1
  • Default Value: null

Configuring variable extraction

Defining load levels using a constant throughput timer

When load testing I find it useful to run a number of tests at different levels of load – for a very specifically defined number of requests per second. To achieve this level of control, the Constant Throughput Timer element is very handy. This timer allows you to specific a target throughput in samples per minute. JMeter will then throttle back requests, if needed, to attempt to achieve this rate. The only thing you may need to do is ensure you have enough threads allowed to meet the desired throughput. Remember that the number of threads can be defined within the Thread Group settings.

Timer settings

Enjoy your load testing!

posted on Friday 2nd July 2010 by Dave

PHP and Cassandra

Yesterday (1st July) I presented for the first time at the PHP London user group. It was a gentle introduction; a five minute “lightening” talk slot. I spoke about Cassandra, giving a short introduction to using it with PHP.

To summarise my main points from the talk (perhaps something I should have done in the talk!)

  • Cassandra is a “highly scalable second-generation distributed database”
  • It can be considered a schema-less database insofar that each row can have different columns
  • Cassandra is designed to be both fault tolerant and horizontally scalable – both read and write throughput go up linearly as more boxes are added to the cluster
  • I think the best way of accessing Cassandra from PHP is directly via the Thrift API. This allows a beginner to learn about the core functionality of Cassandra including its limitations
  • Cassandra has Hadoop support which means that Hadoop Map Reduce jobs (a scalable, distributed mechanism for processing data) can read and write to Cassandra*
  • Cassandra does not have any query language (as opposed to MySQL or MongoDB which both allow you to query data in different ways)
  • When designing your data model, I think its easiest to try to forget about SQL and concentrate on how Cassandra works (don’t design a relational schema and then “port” it over)

* As of version 0.7!

Overall, I think Cassandra is a very useful tool. Whether it fits your use case or not is another matter!

If you’re interested in learning more about using Cassandra in a PHP project, I recommend the following starting points:

  1. Using Cassandra with PHP
    https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
  2. WTF is a SuperColumn? An Intro to the Cassandra Data Model
    http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
posted on Monday 22nd March 2010 by Dave

Caching dependency-injected objects

This blog posts talks about caching and retrieving objects in PHP (eg: via Memcache) where the objects themselves have a number of injected dependencies. It includes using the PHP magic methods __sleep and __wakeup to manage serialisation. It also discusses mechanisms for re-injecting dependencies on wakeup via a method that maintains Inversion of Control (IoC).

This post covers:

Sample system

To illustrate the idea, I’ll use a simple domain model where we have a userList object (iterator) containing a number of user objects. Each user has an injected userDao dependency which is used for lazy-loading usageHistory, on request.

class userList
{
    public function current() { }

    public function key() { }

    public function next() { }

    public function rewind() { }

    public function valid() { }

    public function count() { }
}

class user
{
    private $usageHistory;

    public function __construct($dao, $userDataRow)
    {
        $this->dao = $dao;
        $this->usageHistory = NULL;
    }

    public function getUsageHistory()
    {
        if ($this->usageHistory === NULL)
        {
            $this->usageHistory = $this->dao->lazyLoadHistory($this);
        }
        return $this->usageHistory;
    }
}

class userDao
{
    public function __construct($database, $cache, $logger)

    public function getList() { }

    public function lazyLoadHistory() { }
}

class usageHistory
{
}

Dependency Injection

A sample invocation of this simple system might be to ask the DAO for a user list object. To create a DAO object we will almost certainly need to pass in a bunch of dependencies such as database services, caching services and logging services.

I’m using a DI container to create objects. To get a really quick idea of what these are about you can imagine doing this:

$diContainer = new diContainer();
$userDao = $diContainer->getInstance('userDao');

Instead of this:

$configuration = new systemConfig();

$database = new mysqlDatabaseConnection($configuration);
$cache = new memcacheConnection($configuration);
$logger = new firebugLogger();

$userDao = new userDao($database, $cache, $logger);
$userList = $userDao->getList();

The key idea is that the DI container will build the object graph for you. For each dependency needed it will go away and fetch that, building any other dependencies of those objects and so on recursively up the tree.

I’m using an annotation system to power my own DI container; making the whole process simple and configuration-light.

Putting objects to sleep

Caching is a very handy tool to improve the performance of applications. Storing objects in a cache (for example Memcache) prevents us having to go to database each time. Memcache is a very simple system; a key-value store. You give it some data (less than 1MB) and it stores it for you until you ask for it again. Storing objects is slightly more complex than simple strings; with objects you need to serialise them. Memcache actually does this for you (you don’t need to call serialize() first).

However caching objects can be problematic. Whenever you start to really use the power of OOP you inevitably end up with complex object graphs. Our user object, for example, contains a userDao object. This in turn contains a database service object, a cache service object and a logging service object. Some of these objects have their own dependencies! For example the database service object contains a configuration object.

The key point here is that by default, when we serialise a user object we will be serialising all the internal properties, including all the dependencies. This is undesirable.

This is where PHP’s built-in magic __sleep method comes to the rescue. Using __sleep we can tell PHP what we do want to store. Let’s assume our user object has the following properties:

class user
{
    private $dao;
    private $name;
    private $age;
    private $email;
    private $phoneNumber;
    private $usageHistory;
}

What we’ll do is tell PHP what we want to save.

class user
{
    public function __sleep()
    {
        return array('name', 'age', 'email', 'phoneNumber');
    }
}

Now we can serialise and/or cache objects without the overhead of complex dependency graphs.

Waking objects up

When it comes to restoring objects, for example via Memcache::get or via unserialize(), we will end up with a user object that has a valid name, age, email and phoneNumber property. What we won’t have is the DAO dependency or the usageHistory property. It is important to realise that the class constructor will not be called when the object is unserialised.

For pure simplicity we can use PHP’s built-in magic __wakeup method to execute code on unserialisation.

class user
{
    public function __wakeup()
    {
        $this->usageHistory = NULL;
        $diContainer = new diContainer();
        $userDao = $diContainer->getInstance('userDao');
    }
}

This is handy for ensuring that the usageHistory property is properly set to NULL (so it will lazy-load). The problem with this approach is that we lose the Inversion of Control. Instead of injecting the dependencies, we are instead looking them up; we have a tightly coupled dependency to the DI container. One of the key points of DI is that the objects themselves shouldn’t really know or care about the DI container.

When constructing objects using the DI container we never directly use the “new” keyword to create objects – instead we rely on the DI container to do this for us. This supplies all dependencies as parameters. However we can’t replace the call to __wakeup; and therefore we can’t inject dependencies here.

Restoring dependencies

To ensure that dependencies are restored correctly I use a “magic” method __restoreDependencies. Ok so it’s not actually that magic; PHP doesn’t call it automatically! However the serialisation/unserialisation in my application is localised within my cache object. Therefore what I can do is adjust my cache::get method:

class cache
{
    public function get($key)
    {
        $value = $this->memcache->get($key);
        if (is_object($value) && $value instanceof cacheable)
        {
            $this->diContainer->wakeup($value);
        }
    }
}

To make life easy I actually use a “cacheable” interface that objects must implement in order to be stored in cache. This formality really just ensures that no one tries to cache objects without making sure they think of the implications on dependencies. The cacheable interface simply ensures that an object has a __restoreDependencies() method.

The (bespoke) DI container has a “wakeup” method that will:

1. Call the __restoreDependencies() method injecting any required services (dependency objects)

2. If the __restoreDependencies() method returns an array of other objects, call the wakeup() method on those objects as well. This can repeat recursively if required.

The second point here ensures that we can cache an entire userList object and wake it up effectively. The userList object’s __restoreDependencies() method would return an array of all user objects that need waking up.

The result is that I can cache complex object graphs without dependencies, but have these dependencies automatically “fixed” when objects are retrieved from cache. The objects themselves don’t really know anything about the process. Instead all they need to do is define a simple interface which defines the required dependencies.

posted on Friday 29th January 2010 by Dave

Setting up Git on CentOS 5 server

I’m currently setting up Git for our company. The reason is that Git is better than X. This post is all about how to get Git setup on CentOS 5. There are other posts on this topic, of course, but this one is better!

Two minute intro to Git

I’ve come from an SVN background; you checkout a copy of a central repository, make some changes and commit. Git is a slightly different beast in that it is a distributed Source Control Management system. What this means is that you have your own local repository where you can happily commit changes (whether online or offline). To share your changes with others, you can then push your changes to another repository (either their repository or some central repository if you’d prefer). Similarly, to work on someone else’s code, you can create your own cloned version of their repository and then pull updates as required.

The reason I’m switching to Git is all about branching – I find this a real pain in SVN. If you’re not convinced you can click here to find out Git is better than your current SCM.

Installing Git on CentOS 5

Installing Git on CentOS 5 is easy if you make use of the EPEL (Extra Packages for Enterprise Linux) repository. You’ll know if you’ve got this installed if the following fails:

yum install git

To setup EPEL all you need to do is create a file /etc/yum.repos.d/epel.repo and then paste in the following:

[epel]
name=Extra Packages for Enterprise Linux 5 - $basearch
#baseurl=http://download.fedoraproject.org/pub/epel/5/$basearch
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-5&arch=$basearch
failovermethod=priority
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL

[epel-debuginfo]
name=Extra Packages for Enterprise Linux 5 - $basearch - Debug
#baseurl=http://download.fedoraproject.org/pub/epel/5/$basearch/debug
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-debug-5&arch=$basearch
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL
gpgcheck=1

[epel-source]
name=Extra Packages for Enterprise Linux 5 - $basearch - Source
#baseurl=http://download.fedoraproject.org/pub/epel/5/SRPMS
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-source-5&arch=$basearch
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL
gpgcheck=1

Now you can install using:

yum install git git-daemon

Creating and sharing a repository

Creating a repository is easy! Simply create a folder and type git init.

mkdir newrepo
cd newrepo
git init

Once created, we can copy/create our files (think svn import) and then do:

git add .
git commit

Once you’ve created a repository, you’ll probably want to share it. This means that other people can pull and push changes. There are a number of ways of accomplishing this (this blog post lists 8 possibilities). My usual method for sharing SVN repositories is via Apache; Git supports this as well. I think one of the simplest solutions is to use the Git Deamon. To allow others to pull and push you can share your Git repository using the following command:

git daemon --reuseaddr --base-path=/path/to/repos --export-all --verbose --enable=receive-pack

This command will share all repositories found within the folder /path/to/repos (so we would have created our “newrepo” folder within this location). Once shared you can clone the repository using the git resource locator syntax:

git clone git://remote.computer.hostname/newrepo

Or you can just use the IP address if you’d prefer.

You should now have a repository setup on a CentOS 5 server which you can clone and then pull/push updates.

Tortoise; the familiar client for Windows

When I’ve used Mercurial in the past (another distributed SCM), I’ve actually found the command line tools significantly easier to use than the GUI (Tortoise). However there is a level of familiarity that a TortoiseSVN-like frontend provides.

Tortoise Git

TortoiseGit has all the features you’d be used to from using TortoiseSVN plus Pull, Push and all the other Git-specific stuff.

Git hub

It’s worth making a quick mention of Git Hub. According to the website, “GitHub is the easiest (and prettiest) way to participate in that collaboration: fork projects, send pull requests, monitor development, all with ease.”.

Git Hub provides a handy way of visualising a Git project (listing commits, branches and pretty-printed code). It avoids the need to setup your own central Git repository and mess about setting the server up. A lot of projects seem to be moving this way, for example Symfony.

Finding out more – useful links

http://whygitisbetterthanx.com/

Some well thought out and concise arguments as to why Git is better than other SCM systems.

http://gitready.com/

Excellent site – “learn git one commit at a time”. Lots of help and advice clearly laid out.

http://git.or.cz/course/svn.html

Crash course for SVN users – really good comparison of SVN commands and the equivalent GIT commands.

http://www.jedi.be/blog/2009/05/06/8-ways-to-share-your-git-repository/

8 ways to share your git repository – file share, Git daemon, plain SSH server, SSH server git-shell, Gitosis, Apache http + gitweb, github.

posted on Wednesday 27th January 2010 by Dave

Getting started with Flash AS3 from a PHP developer’s perspective

They say that to become a better programmer, you should learn a new programming language. Exposure to a new language allows you to step outside of the patterns and workflow you’re used to and tackle a problem from a different perspective. There are extremes to this, for instance learning Haskell (an advanced purely functional programming language). I’ve decided to postpone my learning of Haskell, for now, and instead learn Adobe Flash AS3.

This blog post aims to give an introduction to Flash AS3. One good thing about Flash is the large volume of tutorials out there; however this is a double-edged sword. Seperating the wheat from the chaff can be hard work. This post aims to give an introduction to Flash from the perspective of an experienced PHP programmer; someone who already understands the key ideas of OOP. It covers:

  1. Getting started; how to organise code
  2. Adding Movie Clips to the scene
  3. Handling events
  4. The power of arrays
  5. Download source code

Once you’ve put it all together, you’ll have a super-duper rocket-firer (the big white space below is the finished article – hint click on the milk bottle bottom left to launch a rocket)

You’ll need FLASH to see this! Makes sense being that this is a Flash tutorial!

One more thing… a big shout out to other tutorials I’ve found helpful, especially the concise and engaging tutorials from untoldentertainment.com. You can check out all my AS3 links on Delicious.

How to organise code

As Ryan explains in this tutorial, Flash beginners tend to put Actionscript code everywhere.

On different frames of the timeline, embedded in nested MovieClips – if i can write code on it, i do”

The fast path to Flash Pro is to put all your Actionscript in seperate files and break your application down into objects (classes). To get started, fire up Flash (I’m using CS4) and create a new “Flash File (ActionScript 3.0)”. Get the Properties window up (click “Window” > “Properties” if you can’t see it). It should look like this:

Document properties

Type “Main” into the Class box. This is the name of the class that will be attached to main movie; the entry-point of the application. Now we can create this class. Click “File” > “New” and choose “ActionScript File”. Paste in the following code:

package
{
  import flash.display.MovieClip;

  /**
   * Main class - entry point for App
   */
  public class Main extends MovieClip
  {
    /**
     * Constructor
     */
    public function Main()
    {
      trace('It works! - blatantly ripped off from the other tutorial');
    }

    /**
     * Another method
     */
    public function myFirstMethod():void
    {
    }
  }
}

This is a basic “Main” class. We could create other classes in a similar fashion. However we should put each class in its own file. Press CTL-Enter to compile and run; you should find the OUTPUT tab shows the “It works” message.

package

Namespacing basically; I’m just leaving this blank for now. Could be package myExcellentPackage

import flash.display.MovieClip

Similar to PHP’s require_once. We need to make sure that everything this class needs is imported. This starts to be a pain when you use various libraries / built-in Flash classes as you need to know what to import. For example, need to change colour? You’ll need import flash.geom.ColorTransform. Easiest way of knowing what to import is simply to Google it.

public class Main extends MovieClip

The class declaration.

trace(’some text’);

The Flash equivalent of print_r, ish. Useful for outputting console messages for debugging.

public function myFirstMethod():void

Another method declaration. Notice the :void bit which defines the return type explictly; I quite like this feature. Another important thing to note is that the constructor Main must not have any return type defined.

Adding Movie Clips to the scene

To more further down the Flash Pro road, we need to be able to setup the scene (Flash canvas) programatically. We need to be able to add things and remove things as needed, for example adding a high-score table at the end of a game or a splash screen at the beginning. In this example we’ll add everthing to the scene using code. It’s dead simple.

For my rocket firing example, I want to fire the rockets from a milk bottle. We’ll create this as a Symbol. According to the first thing I found on Google (it must be right), a Symbol is simply a reusable object. From a code sense, a Symbol is like a class. We can create copies (instances) of the Symbol and place them in our scene; we can do this from code or from within the Flash GUI. To create the Sybmol, from the FLA, click “Insert” > “New Symbol”. I’ve added a JPEG picture of a milk bottle on the Symbol.

Now we need to set it up so we can tie the Symbol to ActionScript. Get the Symbol properties window up. The easiest way is to find the Symbol from the library, right click on it and click “Properties”. Fill out the details, as shown below:

Symbol Properties

The key thing is the Class property. This is the class it will be tied to in ActionScript.

Now we’ll adjust our Main class to get this added to the scene. Adjust Main.as by adding in the property launchpad and adding to the constructor:

  /**
   * Main class - entry point for App
   */
  public class Main extends MovieClip
  {
    /**
     * Milk bottle (launch pad)
     * @var MovieClip
     */
    private var launchPad:MovieClip;

    /**
     * Constructor
     */
    public function Main()
    {
      launchPad = new milkBottle();
      launchPad.x = 50;
      launchPad.y = stage.stageHeight-25;
      launchPad.height = 40;
      launchPad.scaleX = launchPad.scaleY;
      addChild(launchPad);
    }
  }

private var launchPad:MovieClip;

So you’ll probably recognise this as declaring a private class property. The key things is the :MovieClip which defines what type of variable this is. You’ll see this a lot in Flash.

launchPad = new milkBottle();

Create an instance of our milkBottle class (Symbol). You’ll notice that we don’t need to bother with this. (or $this->) as we would in PHP to access the member property launchPad.

addChild(launchPad);

This adds the object to the stage. It’s a bit like the JavaScript DOM in a lot of ways. This is the simplest way of adding an object to the scene; we can also use other methods like addChildAt(launchPad, 1).

launchPad.height = 40;

Resize the object. launchPad.scaleX = launchPad.scaleY completes the job – adjusting the width automatically to make the aspect ratio correct.

Handling events

Event handling in AS3 will be instantly familiar to anyone who’s used jQuery, or indeed JavaScript, so long as they’ve stuck to the idea of seperating content from functionality (no inline event handlers). For the rocket launching app, we need the launchPad object to handle a mouse click event and fire off a rocket (click on the milk bottle to fire a rocket!) We can attach an event handler by adding the following line to Main().

  launchPad.addEventListener(MouseEvent.CLICK, launchRocket);

To access the event handler functionality, we also have to make sure we import the right libraries.

  import flash.events.Event;
  import flash.events.MouseEvent;

Finally, we need a method to handle this event. We’ll add this to our class:

  /**
   * Launch a new rocket
   * @param Event e
   */
  public function launchRocket(e:Event):void
  {
    trace('Launch!');
  }

The power of arrays

So we’ve made a good start. Next up, we want to be able to launch a rocket. We want to launch one whenever the milk bottle is clicked. Some how we need to keep track of all these rockets. This is where an array proves useful. We can add a new class property.

  /**
   * Rockets
   * @var Array
   */
  private var rockets:Array;

Within the Main class constructor we’ll initialise the rockets array.

	rockets = new Array();

Then we’ll flesh out the event handler to create new rocket objects.

  var rocket = new firework();
  rocket.x = 50;
  rocket.y = stage.stageHeight-25;
  rocket.height = 40;
  rocket.scaleX = rocket.scaleY;
  addChildAt(rocket,1);
  rockets.push(rocket);

Now the rocket class is an interesting one. This is where I’ve started to get a bit more interesting – starting to explore Flash. The key thing to realise is that the rocket class encapsulates the behaviour of a rocket. It sets itself up (via the constructor) and then it adjusts its own behaviour (in this instance location on the stage) each frame. How it does this is another interesting part of Flash: the Event.ENTER_FRAME event. Within the firework class constructor we have:

  addEventListener(Event.ENTER_FRAME, updateState);

This fires off an event every time Flash starts a new “frame” – n times a second, where n is defined within the main Flash file’s properties.

If you dig through the firework class (and indeed the smoke class), you’ll see how they set themselves up, update their state once a frame and eventually remove themselves once a predefined lifetime has completed.

Source code

Click here to download the full source code.

posted on Friday 18th December 2009 by Dave

Christmas reading list

It’s that time of year when you’re wondering what Santa will bring you on Christmas day! Taking the ever-popular idea of “n books every programmer should read”, I decided to compile my own list. If you’re lucky, Santa might bring you one of these.

Coders at Work

Coders at Work An absolute barnstorming programming book and exactly the sort of book you could actually read on Christmas day; honest. A series of insightful interviews with top programming minds.
Find out more

Mythical Man Month

Mythical Man Month Another book that’s light on technical details (reams of code) but heavy on insight. Fred Brooks’ classic essays still resonate today, 34 years after the book first appeared.
Find out more

Patterns of Enterprise Application Architecture

Patterns of Enterprise Application ArchitectureArguably a bit heavy going for Christmas day but probably the most useful book I’ve read in the last six months. I’m constantly referring back to it for details and have taken to including references to the text within my comments when writing code. Santa wouldn’t dissapoint if he gave you this.
Find out more

Have a great Christmas! See you in 2010.

posted on Monday 23rd November 2009 by Dave

PHP dependency strategies: dependency injection and service locator

In this post I’m hoping to answer my own question: what strategy shall I use for handling dependencies in my new project? I’m going to explore three possible strategies that should help create good quality, uncoupled code:

  1. Simple Dependency Injection (DI)
  2. The Service Locator pattern
  3. A DI framework

This includes a bespoke implementation of a DI framework for PHP that automatically creates configuration by conducting a simplistic static analysis of code.

What are dependencies

Consider the following code. This simple application comprises a simple event domain object combined with a Data Access Object (DAO) that deals with persistance.

event.class.php
class event
{
    private $name;
    private $cost;
    private $eventDate;

    /**
     * @param array $row Information on this event from DAO
     */
    public function __construct($row)
    {
        $this->name = $row['name'];
        $this->cost = new money($row['cost']);
        $this->eventDate = new date($row['date']);
    }

    public function __toString()
    {
        return "EVENT: {$this->name}\nCOST:  {$this->cost}\nDATE:  {$this->eventDate}\n";
    }
}
eventDao.class.php
class eventDao
{
    public function getById($id)
    {
        $db = new database('localhost','mydb','user','password');

        $row = $db->fetchAll(
           "SELECT name, cost, date FROM events WHERE id = ".$db->quote($id);
        );

        return new event($row);
    }
}

An event object is dependant on a money object and a date object. It needs to create these to function correctly.

An eventDao object is dependant on a database object and an event object. It needs the database object to get the data and it needs to create and return a new event object.

Why are depedencies problematic?

Dependencies are not in themselves problematic. It is going to be impossible to write any useful code that doesn’t have some dependencies. The problem is how we handle those the dependencies. The code example provided above presents the following problems.

1. It makes testing impossible

Writing a unit test for the event object will inevitably end up testing the money and date objects. When we create a new event object we have no control over the creation of those dependent objects. This means our unit test will cross the class boundary and what we end up with is an integration test rather than a unit test. Instead of testing the logic of the specific, isolated “unit” (our event object), we are instead testing the event object works in relation to the rest of the program.

While it may not seem immediately obvious why that’s a problem with fairly trivial dependencies such as a money object, the problem is more obvious when considering the DAO. Here we could not test the getById method without inadvertantly testing the database object. Without a fully-functioning database, setup with the expected data, our unit test will fail. Again, this isn’t a unit test, it’s more likely an intergration test or possibly even a system test.

2. The objects are tightly coupled

The eventDao class is tightly coupled to the specific concrete classes event and database. What if we want to use a different database object on our test environment? We can’t. Of course there’s ways round this immediate problem without changing too much at all. We could use global constants DATABASE_NAME, DATABASE_USER etc.. Don’t even go there! Tight coupling makes for brittle code. If you’re not convinced you can read this article or spend 10 minutes with Google.

3. It goes against the Don’t Repeat Yourself (DRY) principle

If we imagine adding some other domain objects and DAOs to our system we will end up repeating the new database() line again and again. The same goes for other domain objects that want to represent information internally as a date object.

An alternative coupling

Let’s say you’ve got this in your DAO instead:

        $db = database::getInstance();

Same problems! We probably still can’t test it (unless we have some kind of database::setInstance() method) and it’s certainly still tightly coupled regardless.

Strategy 1: Dependency Injection

Dependency Injection is very straightforward. In fact it’s so straightforward you’ve almost certainly already done it, even if you didn’t refer to it as DI. Fabien Potencier of Symfony fame explains it expertly in these slides.

To make use of dependency injection, our eventDao can be updated to:

eventDao.class.php
class eventDao
{
    private $db;

    public function __construct($db)
    {
       $this->db = $db;
    }

    public function getById($id)
    {

        $row = $this->db->fetchAll(
           "SELECT name, cost, date FROM events WHERE id = ".$this->db->quote($id);
        );

        return new event($row);
    }
}
$db = new database('localhost','mydb','user','password');
$dao = new eventDao($db);
$event = $dao->getById(1);

We can now test the getById method using a mock database object because we are injecting the dependency into the DAO object.

We can’t, however, isolate testing of the DAO completely because of the event dependency. This can be fixed by delegating responsibility for event creation to a Factory.

eventFactory.class.php
class eventFactory
{
    public function create($row)
    {
        return new event($row);
    }
}
eventDao.class.php
class eventDao
{
    private $db;
    private $eventFactory;

    public function __construct($db, $eventFactory)
    {
       $this->db = $db;
       $this->eventFactory = $eventFactory;
    }

    public function getById($id)
    {

        $row = $this->db->fetchAll(
           "SELECT name, cost, date FROM events WHERE id = ".$this->db->quote($id);
        );

        return $this->eventFactory->create($row);
    }
}

Our code is now loosely coupled and we are programming to interface (allbeit that I haven’t actually put any interfaces into the code at this stage!) To quote kdgregory from StackOverflow:

Programming to an interface is saying “I need this functionality and I don’t care where it comes from.”

By putting in place a factory for creating objects we can program only to the interface we require. In our event class, this means we don’t have to rely on specific concrete implementations for date and money, instead we merely require some object that implements iDate and iMoney, and we can use a factory to make us one of those objects.

Inversion of Control (IoC)

It’s worth noting that by injecting dependencies into the objects we have inverted control, effectively because rather than the procedural/linear style of setting up object and then doing something, we have passed in an object and are executing what almost amounts to a ‘call back’ on it. The term “Inversion of Control” seems to come up frequently when reading about DI, although I’m not sure that it’s always that clearly explained. Fowler explains it in this article. There are also some other interesting blog posts on the subject. If you want the short answer on what IoC is, check out this definition.

Object graphs

Using DI, what we inevitably end up with is a complex object graph. An object graph is simply a set of interconnected objects. In the case of DI, we have lots of interconnected objects since we are passing all our dependencies around as objects – so we end up with a lot of objects related to a lot of other objects at run time!

Strengths

This form of dependency injection is easy to understand. We avoid tight coupling, we can test our code and we are programming to interface. All is good!

Weaknesses

One of the web apps I work on uses a Front Controller pattern to handle incoming requests. The bootstrap code looks a bit like this:

$fc = new frontController();
$fc->dispatch();

If I take the issue of dependency injection to the extreme, slightly insane, but on some level undeniably logical, conclusion, I would have to inject all dependencies needed by the entire application into the constructor of the front controller! This post by Ben Scheirman on StackOverflow gives another example:

var svc = new ShippingService(new ProductLocator(),
   new PricingService(), new InventoryService(),
   new TrackingRepository(new ConfigProvider()),
   new Logger(new EmailLogger(new ConfigProvider())));

You get the idea.

We could be pragmatic about this and suggest that individual page controllers are allowed to be tightly coupled to domain objects. However it merely defers the inevitable.

The problem with this strategy is that if you add a dependency to an object, you then have to add it to all parent objects that use that object. This becomes a recursive task so the change causes a ripple effect to other code. The documentation to Google Guice explains it quite well (scroll down to the “Dependency Injection” section – due to their clever reader thing I can’t get an anchor link straight to it!) This problem relates to the inherent complexity involved in creating a large object graph.

Unfortunately, now the clients of BillingService need to lookup its dependencies. We can fix some of these by applying the pattern again! Classes that depend on it can accept a BillingService in their constructor. For top-level classes, it’s useful to have a framework. Otherwise you’ll need to construct dependencies recursively when you need to use a service

Strategy 2: Service Locator

Martin Fowler explains the idea of a service locator in detail in his article on the subject of DI. I’m going to explain it in the context of a PHP application.

A service locator is a straight forward system whereby objects can “look up” any dependencies they need from a central source. This gives the following advantages:

  • It is easy to add a dependency to any object
  • It is easy to replace which dependency is provided project wide, so we are adhering to the DRY principle
  • It removes tight coupling between objects

The simplest service locator may look like:

serviceLocator.class.php
class serviceLocator
{
    public static function getDatabase()
    {
        return new database();
    }

    public static function getDateFactory()
    {
        return new dateFactory();
    }

    public static function getMoneyFactory()
    {
        return new moneyFactory();
    }

    public static function getEventFactory()
    {
        return new eventFactory();
    }
}

Our DAO now becomes:

eventDao.class.php
class eventDao
{
    public function getById($id)
    {

        $row = serviceLocator::getDatabase()->fetchAll(
           "SELECT name, cost, date FROM events WHERE id = ".$this->db->quote($id);
        );

        return serviceLocator::getEventFactory()->create($row);
    }
}

For testing we’d need to add in equivalent methods like serviceLocator::setMoneyFactory and serviceLocator::setDatabase.

We can simplify (or complicate depending on your point of view) our service locator by replacing methods like serviceLocator::getMoneyFactory() with a more generic serviceLocator::getService($serviceName). We could then configure the service locator in our bootstrap with calls to serviceLocator::registerService($serviceName, $object). If we really wanted to go to town we could use an XML or YAML file to store the details of the dependencies that the service locator provided. For a working system, we probably would want to go this far.

In terms of coupling, we have replaced the coupling of objects from our very first example (where eventDao was tightly coupled to database and event) with equally tight coupling, albeit this time to a single object – the service locator object. Whether this is desirable will come down to the details of the application. As Fowler points out in his discussion of locator vs injector:

The key difference is that with a Service Locator every user of a service has a dependency to the locator. The locator can hide dependencies to other implementations, but you do need to see the locator. So the decision between locator and injector depends on whether that dependency is a problem.

In terms of practical implementations, Mutant PHP has published an article on this subject which includes a sample service locator class.

The fairly new Symfony Dependency Injection Container appears to be based around the idea of a service locator. I say this because it doesn’t implement an inversion of control mechanism – as covered in strategy 3.

Strengths

The service locator provides a simple strategy for managing dependencies that is easily understood. It allows for testing and it avoids tight coupling between classes.

Weaknesses

The use of a service locator leads to a tight coupling between classes and the service locator itself.

Strategy 3: DI Framework

A dependency injection “framework” is an alternative strategy for dealing with dependencies to the arguably simpler service locator. They key idea is to stick with depedency injection (either into the constructor or via a setter), but have some external object (Fowler calls this an “assembler” in his article on the subject) actually deal with managing the dependencies, injecting them into objects as required, without the user having to worry about it.

Now coming from a PHP background, I’ve searched about for PHP-specific information on DI frameworks. So far, I haven’t managed to find anything that I feel explains the concept as well as the Guice documention does. In terms of responsiblity-driven design, Guice outlines the role of the “injector“ (Fowler’s “assembler”):

The injector’s job is to assemble graphs of objects. You request an instance of a given type, and it figures out what to build, resolves dependencies, and wires everything together.

This sounds promising, although I’m not 100% convinced I need a DI framework, I’m starting to see some advantages. There is an interesting discussion on StackOverflow (again!) about the need for a DI framework.

A bespoke DI Framework

To help understand the advantages of a DI framework I built my own, which I’ve rather confusingly called a “Service Injector”. As Benjamin Eberlei explains in a blog post on the subject of DI:

My subjective feeling tells me there are now more PHP DI containers out there than CMS or ORMs implemented in PHP, including two written by myself (an overengineered and a useful one).

Having recently read the excellent Coders at Work (go and buy it now if you haven’t read it), I took some advice from Donald Knuth who said:

The problem is that coding isn’t fun if all you can do is call things out of a library, if you can’t write the library yourself.

So I decided to write my own.

Design motivations

In Benjamin’s post, he goes on to say:

Its an awesome pattern if used on a larger scale and can (re-) wire a complex business application according to a clients needs without having to change much of the domain code.

I think my motivations for DI framework are somewhat different. I don’t see myself wanting to “re-wire” an application at a later date.
Ideally I want my logic and wiring to remain clear at the code level; I personally don’t want to delegate all the wiring to some configurator – I can see that making any debugging task harder. What I want from a framework is something that will do the hard work for me; something that will supply actual dependencies automatically.

This led me to make the following decisions:

  • I wanted an automated builder; something that would look at the code and get the DI framework setup ready to go – based on class names and interfaces
  • I wanted to keep Factory classes; I think it makes logical sense to have a class who’s responsibility is to create new objects of type blah

A sample application

At the top level, I can ask the DI framework to create me an object:

// setup the service injector
include APP_PATH.'siConfig.php';
$serviceInjector = new serviceInjector();

// ----

// for our test app we'll just pretend we're looking at the details
// of event #1:

$oDao = $serviceInjector->getInstance('eventDao');
$oEvent = $oDao->getById(1);

The DAO object is created, along with its dependencies; this all happens simply by annotating the code within the PHPDocumentor style comment blocks:

    /**
     * Constructor
     * @param iDatabase $database A service that will allow us to execute SQL
     * @param iEventFactory $eventFactory A service that will create event objects for us
     * @inject This informs the DI builder to inject constructor parameters on object creation
     */
    public function __construct(iDatabase $database, iEventFactory $eventFactory)

The service injector will find a class that implements iDatabase and iEventFactory and automatically inject these on object creation. The interesting thing is that either of these two services can have their own dependencies. For example, my eventFactory class declaration looks like this:

class eventFactory extends factory implements iEventFactory

It extends the Layer Supertype factory. The factory base class has a method to set its own dependency, again specified via annotation:

    /**
     * Set service injector
     * @inject This informs the DI builder to inject method parameters immediately after object creation
     */
    public function setServiceInjector(iServiceInjector $serviceInjector)
    {
        $this->serviceInjector = $serviceInjector;
    }

The service injector will happily go away and recursively assemble the required objects and their dependencies.

The builder

I have a script that can be executed as part of an automated build process (see my other post) that will create a pure-PHP configuration file for my service injector. It works by conducting a somewhat crude static analysis of the code you tell it to examine. It then works out which classes wire up to which interfaces, what extends what, which methods need parameters injecting and which classes should be shared (rather than a new instance created on every request).

Right now, it works as well as it needs to for the sample application. However it doesn’t do very well if you have more than one class that implements a given interface, and then you ask the service injector to build you a blah interface – in this situation it will fail. You’ll notice that although I’ve got a lot of interfaces in the sample application, they all have one class that implements the interface. I think this is a worthwhile exercise because it gets you into the mindset that you are programming to interface and thinking about messages that the objects send other objects.

Pros and cons

I like how my implementation creates the wiring-up configuration automatically based on the actual code. I also like how the service injector is really focussed on programming to interface: a service is simply some object that will provide a set of capabilities and the service injector’s only job is to inject these objects at run time. It does not deal with injecting strings and other configuration parameters; I think that’s OK since a string is not a service – and I set out to build something that would only do that job.

I guess that’s where my service injector differs from other implementations of dependency injection containers – I have focussed purely on something that will provide services, not any other types of depdency, such as configuration strings. Perhaps this could be considered a con!

The static analysis in this simple version is fairly rudimentary, although that said it will quite happily analyse the Zend framework source code. I tried this out and then made my date factory ask the service injector for a new Zend_Date object. This all worked fine – simply by changing one line of code.

The source code

So I’ve written this tool purely as a way to learn about the ideas involved and also to see if I could find a structure that I thought was useful for my application. It’s been done pretty quickly but if you’d like to have a closer look you can browse the source code here.

Other implementations for PHP

Conclusions

Through this process of research I have come to the following conclusions:

  • I prefer DI over a service locator because the individual modules are cleaner; dependencies are passed in rather than the object itself going and asking for them
  • A DI framework seems the way to go, simply to reduce the complexity involved in manually creating complex object graphs
  • I like Factory classes because they serve a clear purpose and make code easy to understand
  • I want a DI framework to be able to work (almost) completely from the source code

My next step is to look more closely at existing implementations to see if they could work in a production project.

posted on Monday 9th November 2009 by Dave

Setting up continuous integration for PHP using Hudson and Phing

In this, my first post, I’m going to write about the benefits of Unit Testing and how Continuous Integration (CI) can be used to get the best out of Unit Testing. This will include details of how I setup a CI system using Hudson CI server, Phing build tool combined with various other analysis tools (including PHP Unit).

One of the best explanations of Unit Testing I’ve read was posted by benzado on Stack Overflow.

Unit testing is a lot like going to the gym. You know it is good for you, all the arguments make sense, so you start working out. There’s an initial rush, which is great, but after a few days you start to wonder if it is worth the trouble.

The difficulty with Unit Testing is keeping it up. It is very easy to slip into poor habits and before you know it there’s a huge chunk of code with no tests. Possibly a huge, badly designed chunk of code, that didn’t benefit from having tests written before it was coded. Before you know what’s going on, you end up with a project that you really can’t write tests for, because retrofitting the tests is near impossible.

For me, there are two critical reasons for Unit Testing:

  1. Enforcing good design
    To be able to write tests, you need to be able to zero in on a “unit” of code, isolating it from all the rest of your 1,000,000 lines of web application. Writing Unit Tests forces you to design systems that have loose coupling because otherwise it is impossible to test.
  2. Allowing changes to be made in confidence
    Without Unit Tests, you get to the point where no one really wants to make any changes to the code. This is especially true in a commercial environment, where many people have worked on the code, including some key team member who has since left. Unit Tests allow you to make changes to one part of the code and be pretty convinced you haven’t messed up something else.

Continuous integration

Martin Fowler describes the process of Continuation Integration in detail. He suggests:

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily – leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly. This article is a quick overview of Continuous Integration summarizing the technique and its current usage.

The key idea behind CI is to do what is most painful often, namely “building” everyone’s code from source and making sure it all works.

A CI system usually consists of the following key elements:

Continuous integration

Continuous integration

  • Developers commit code
  • CI server detects changes
  • CI server checksout code, runs tests, analyses code
  • CI server feeds back to development team

If you want to find out more about CI, I recommend the excellent book Continuous Integration: Improving Software Quality and Reducing Risk. There is an excerpt published on JavaWorld which covers a lot of the key advantages. In particular, it highlights:

1. Reduce risks
2. Reduce repetitive manual processes
3. Generate deployable software at any time and at any place
4. Enable better project visibility
5. Establish greater confidence in the software product from the development team

CI gets the most out of Unit Tests by forcing them to be run after every change. Not only that, but with a good CI setup, developers instantly know if they haven’t written enough tests. If avoids the situtation where Joe Bloggs has added in a huge chunk of code with zero tests.

Setting up CI for a PHP project

To get my environment setup, I consulted the following blog posts which are worth a read:

  1. http://blog.jepamedia.org/2009/10/28/continuous-integration-for-php-with-hudson/
  2. http://toptopic.wordpress.com/2009/02/26/php-and-hudson/

I’m assuming you’re using a CentOS 5 server (or I guess RHEL5). If not, you may still find various parts of this useful.

1. Install JDK

EPEL provide a set of CentOS packages, including a package for openJDK. This is the easiest way of installing Java.

Firstly, setup EPEL:

wget -O /etc/yum.repos.d/hudson.repo http://hudson-ci.org/redhat/hudson.repo

Next install OpenJDK:

yum install java-1.6.0-openjdk

2. Install Hudson

Download and install the CentOS RPM for Hudson:

wget -O /etc/yum.repos.d/hudson.repo http://hudson-ci.org/redhat/hudson.repo
rpm --import http://hudson-ci.org/redhat/hudson-ci.org.key
yum install hudson

Now Hudson is installed, we can start using the standard CentOS “service” command.

service hudson start

We can check Hudson is working by pointing the browser at port 8080 (the default Hudson port). Hudson will work “out of the box” and give you a web interface immediately. This is the primary reason I decided to go with Hudson over the other possibilities, eg: CruiseControl and phpUnderControl. Although I didn’t do an exhaustive analysis before I decided on Hudson, it just seemed right to me.

To get the graphing engine working for Hudson, you may need to install x.

yum groupinstall base-x

3. Install phing

Phing is a PHP project build system or build tool based on Apache Ant. A build tool ensures that the process of creating your working web application from source code happens in a structured and repeatable way. This helps reduce the possibility of errors caused by simply uploading files via FTP or some other simple method.

Make sure PEAR is installed for PHP (this is the easiest way of installing phing):

yum install php-pear

Then install the PEAR phing package:

pear channel-discover pear.phing.info
pear install phing/phing

4. Setup SVN

If you haven’t got a Subversion repository, you’re going to need one (or some other SCM tool like CVS, GIT or Mercurial).

yum install mod_dav_svn

The simplest setup involves creating a repo in /var/www/svn/<my repo>

mkdir -v /var/www/svn/test
svnadmin create --fs-type fsfs /var/www/svn/test
chown –R apache:apache /var/www/svn/test

Setup Apache by pretty much uncommenting the lines in /etc/httpd/conf.d/subversion.conf. Once Apache restarted, you’ll be able to get to it via /repos/test, assuming you’re using the default settings (sets up SVN on /repos). I haven’t gone into the details of getting SVN up and running; there are lots of resources out there that will help you do this.

5. Install PHP tools

PHPDocumentor – to generate documentation automatically from code
pear install PhpDocumentor
PHP CPD – “copy and paste detector” for PHP

This requires PHP 5.2. At time of writing, this wasn’t standard with CentOS 5, but is part of the CentOS “test” repo. This can be setup by creating a yum repo file, eg: /etc/yum.repos.d/centos-test.repo and populating with:

[c5-testing]
name=CentOS-5 Testing
baseurl=http://dev.centos.org/centos/5/testing/$basearch/
enabled=1
gpgcheck=1
gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing

Then you can do:

yum update php

You may also need to upgrade pear; if the install of phpcpd fails (below). To do this, try:

pear upgrade pear

or, if this wants to be forced, and you think it’s a good idea (I did):

pear upgrade --force pear

Finally we can install phpcpd!

pear channel-discover pear.phpunit.de
pear install phpunit/phpcpd
PHP Depend – help analyse quality of codebase

Note you may have update PHP to include the DOM module (first line below).

yum install php-dom
pear channel-discover pear.pdepend.org
pear install pdepend/PHP_Depend-beta
PHP Code Sniffer – analyse code for adherence to style/standards
pear install PHP_CodeSniffer-1.2.0
PHP Unit – unit test framework for PHP
pear channel-discover pear.phpunit.de
pear install phpunit/PHPUnit

To make PHP Unit work, we need XDebug installed, the PHP profiler.

yum install php-devel gcc
pecl install xdebug

6. Install Hudson plugins

Use the web interface to install the following plugins (Manage Hudson -> Plugins).

  • Checkstyle
  • Clover
  • DRY
  • Green Balls (handy because it shows successful builds as green circles rather than blue)
  • JDepend
  • xUnit (will handle the output of PHPUnit test results XML)

7. Setup the phing build script

The Phing build script defines what steps will be taken to “build” the application.

Hudson itself works by placing our code into a project workspace. It will checkout the code from subversion and place it into the following location, where “Test” is the name of our project.

/var/lib/hudson/jobs/Test/workspace/

We can then use the Phing build script to carry out a number of processes on this code. When we talk about “building”, what we will actually do is place the code where we need it so it can actually run the website (we’ll keep this within the workspace) plus we run tests etc…

We’ll keep the build script in the subversion repository, so effectively it will be updated from SVN each build. For this approach to work, the following XML needs to be stored in a file named build.xml, stored in the project root folder (within trunk), eg: /trunk/build.xml

<?xml version="1.0" encoding="UTF-8"?>
 <project name="test" basedir="." default="app">
    <property name="builddir" value="${ws}/build" />

    <target name="clean">
        <echo msg="Clean..." />
        <delete dir="${builddir}" />
    </target>

    <target name="prepare">
        <echo msg="Prepare..." />
        <mkdir dir="${builddir}" />
        <mkdir dir="${builddir}/logs" />
        <mkdir dir="${builddir}/logs/coverage" />
        <mkdir dir="${builddir}/docs" />
        <mkdir dir="${builddir}/app" />
    </target>

    <!-- Deploy app -->
    <target name="app">
        <echo msg="We do nothing yet!" />
    </target>

    <!-- PHP API Documentation -->
    <target name="phpdoc">
        <echo msg="PHP Documentor..." />
        <phpdoc title="API Documentation"
                destdir="${builddir}/docs"
                sourcecode="yes"
                defaultpackagename="MHTest"
                output="HTML:Smarty:PHP">
            <fileset dir="./app">
                <include name="**/*.php" />
            </fileset>
        </phpdoc>
    </target>

    <!-- PHP copy/paste analysis -->
    <target name="phpcpd">
        <echo msg="PHP Copy/Paste..." />
        <exec command="phpcpd --log-pmd=${builddir}/logs/pmd.xml source" escape="false" />
    </target>

    <!-- PHP dependency checker -->
    <target name="pdepend">
        <echo msg="PHP Depend..." />
        <exec command="pdepend --jdepend-xml=${builddir}/logs/jdepend.xml ${ws}/source" escape="false" />
    </target>

    <!-- PHP CodeSniffer -->
    <target name="phpcs">
        <echo msg="PHP CodeSniffer..." />
        <exec command="phpcs --standard=ZEND --report=checkstyle ${ws}/source > ${builddir}/logs/checkstyle.xml" escape="false" />
    </target>

    <!-- Unit Tests & coverage analysis -->
    <target name="phpunit">
        <echo msg="PHP Unit..." />
        <exec command="phpunit --log-junit ${builddir}/logs/phpunit.xml --log-pmd ${builddir}/logs/phpunit.pmd.xml --coverage-clover ${builddir}/logs/coverage/clover.xml --coverage-html ${builddir}/logs/coverage/ ${ws}/source/tests"/>
    </target>
</project>

8. Setup Hudson

The first step is to create a new job.

  • From the Hudson homepage, click New Job.
  • Enter a Job name, for example “Dave’s Product Build” and choose “Build a free-style software project”. Click OK.

Now you need to configure the job; the configuration form should be displayed immidiately after adding.

Under Source Code Management choose Subversion and enter:

  • Repository URL: http://www.myrepo.com/path/to/repo
  • Local module directory: source
  • Check “Use update” which speeds up checkout

Under Build Triggers select Poll SCM and enter the following schedule:

5 * * * *
10 * * * *
15 * * * *
20 * * * *
25 * * * *
30 * * * *
35 * * * *
40 * * * *
45 * * * *
50 * * * *
55 * * * *

Note that this will poll for changes to the repository every 5 minutes and rebuild if any changes are detected.

Under Build click the button to Add build step and choose Execute shell, enter the command:

phing -f $WORKSPACE/source/build.xml prepare app phpdoc phpcs phpunit -Dws=$WORKSPACE

Under Post-build Actions choose:

  • Check Publish Javadoc and then enter:
    Javadoc directory = build/docs/
  • Check Publish testing tools result report and then click Add and pick PHP Unit, enter:
    + PHPUnit Pattern = build/logs/phpunit.xml
  • Check Publish Clover Coverage Report and enter:
    + Clover report directory = build/logs/coverage
    + Clover report file name = clover.xml
  • Check Publish duplicate code analysis results and enter:
    + Duplicate code results = build/logs/phpunit.pmd-cpd.xml
  • Check Publish Checkstyle analysis results and enter:
    + Checkstyle results = build/logs/checkstyle.xml

Finally, click Build Now to test it all works.