Monthly Archives: August 2011

Analyzing Apache Logs with Riak

This article will show you how to do some Apache log analysis using Riak and MapReduce. Specifically it will give an example of how to extract URLs from Apache logs stored in Riak (the map phase) and provide a count of how many times each URL was requested (the reduce phase).

So what is Riak? According to Wikipedia it’s “a NoSQL database implementing the principles from Amazon’s Dynamo paper”. Or, put another way,  it’s a distributed key-value store that has built-in support for MapReduce. If you aren’t familiar with MapReduce a good starting point would be to read Google’s MapReduce paper. I am not going to go over how to install Riak; there’s a good tutorial for that on the Riak website. Riak also has a lot of other features that won’t be covered here. Read More…