back Programming Collective Intelligence

Programming Collective Intelligence

Toby Segaran

Making sense of information is something that Amazon understand. According the James Marcus (hired by Amazon in 1996), Jeff Bezoz began as a “firm believer in the power of content” but gradually changed to believe in a “culture of metrics”. The “Customers Who Bought X also bought Y” feature is arguably the most famous feature, and illustrates perfectly Amazon’s movement to data-driven techniques.

Programming Collective Intelligence is a book that introduces many of these techniques, albeit in a basic form. The term “collective intelligence” refers to “intelligence that emerges from the collaboration of a group.” In other words, the book explains algorithms that can be used to arrive at specific results based on a large set of seemingly disordered input data. One example of this is a feature like the Amazon “Customers Who Bought X” feature. Another is grouping users into clusters based on their behaviour.

The strength of the book is the simplicity with which is explains concepts, using practical examples that can be recreated (often involving various APIs to demonstrate particular ideas). It covers a lot of ground, some of it in sparse detail, really just enough to introduce the concepts (genetic programming springs to mind). The topics covered include:

  • Recommendations (collaborative filter - item and user based)
  • Discovering Groups (clustering, k-means)
  • Searching and ranking
  • Optimisation (using a cost function, hill climbing, genetic algorithms)
  • Document filtering (classification, training, fisher method)
  • Decision trees
  • Building price models (k-nearest neighbours, cross-validation)
  • Advanced classification (kernel methods, svms)

One weakness of the book, I feel, is that the code samples are split up amongst pages and are difficult to follow. There are no comments (probably for brevity) which seems fine when reading the book, but when actually trying to create an implementation makes life hard. In particular I think that variables could have been named better and it would have been easier to use less global state. That said, it is incredibly useful to have the practical examples at all. I have managed to write my own implementations of a collaborative filter and a neural network based on the examples and information from this book.

If you want to get an introduction to these incredibly powerful and pervasive techniques, this is a great book. There are plenty of other books that cover the topics in more detail, but few that offer such an accessible introduction.

7.5/10

Buy now from Amazon