Using Hadoop to Analyze Apache Log Files

After my post a few days ago about analyzing Apache log files with Riak, I thought I would follow that up by showing how to do the same thing using Hadoop. I am not going to cover how to install Hadoop; I am going to assume you already have it installed. What is it they say about assumptions? Also, any Hadoop commands are executed relative to the directory where Hadoop is installed ($HADOOP_HOME). Read More…

Analyzing Apache Logs with Riak

This article will show you how to do some Apache log analysis using Riak and MapReduce. Specifically it will give an example of how to extract URLs from Apache logs stored in Riak (the map phase) and provide a count of how many times each URL was requested (the reduce phase).

So what is Riak? According to Wikipedia it’s “a NoSQL database implementing the principles from Amazon’s Dynamo paper”. Or, put another way,  it’s a distributed key-value store that has built-in support for MapReduce. If you aren’t familiar with MapReduce a good starting point would be to read Google’s MapReduce paper. I am not going to go over how to install Riak; there’s a good tutorial for that on the Riak website. Riak also has a lot of other features that won’t be covered here. Read More…

Picture Perfect

This post kind of follows on from my previous post about creating thumbnails from a video. It describes how to take a snapshot with your webcam using Flex and then save the picture to your server. I decided to create my own little app to do this, originally because I wanted to take a snapshot of something and create a blog post from it from within the WordPress editor – it appears that out of the box you can only select and upload an image from your hard drive. There is an option in Facebook to take a picture when sending a message. I wanted something similar for WordPress. I am guessing there is a probably a plugin somewhere out there that does this, which would be nice as it would save me from having to write my own.

So, on with the solution. I came across a number of solutions – like this one – that appeared to involve creating an HTTP request then manually creating a multipart message, which seemed overly complex to me! Anyway, I vaguely remembered writing some Flex code a while back that did what I was looking for and after spending far too long searching for it on my laptop (don’t ask!) the solution turns out to be much simpler than the ones I had come across out there in Internet land.

Basically, you dump the data from the webcam into a bitmap, convert the bitmap into a JPEG, encode the bytes using base64 and then send the data (as a string) in an HTTP POST. I am not going to replicate the code here to do this as you can view it by taking a look at the webcam snapshot example. To view the source, right click on the page then select “View Source”. The example code relies on as3corelib – I use one of the classes in the library (JPGEncoder) to convert the bitmap data into a JPEG.

The example does an HTTP POST to a simple PHP script that just displays the snapshot you took. The code looks like this:

<?php

header("Content-type: image/jpeg");
echo base64_decode($_POST['image']);

?>

 

Simple eh?

So that is how to take a picture with your webcam and save it to your server using Flex.

YUI Compressor Ant Task

I was working on a project recently that contained a number of JavaScript files. As part of the build process, I wanted to minimise/compress all of the JavaScript files. The first thing that came to mind was to use YUI Compressor.

I was using Ant at the time so I needed to create a target to compress the JavaScript files and then output them to wherever they needed to be outputted to. Problem: By default, YUI Compressor can only handle one file at a time; you can’t pass to it a list of files to process. Damn! Anyway, to cut a long story short, I created a custom Ant task to do that for me. Hoorah!

I’ve set-up a repository on Google Code where you can download the code that generates the custom Ant task. Yes, I know it’s fashionable to create repositories on GitHub but I mostly use Subversion and Mercurial so there!

Alternatively you can check out the project directly by issuing the following command:

hg clone https://yui-compressor-ant-task.googlecode.com/hg/ yui-compressor-ant-task

Technically, it is possible to get the desired behaviour using Ant’s apply task; see the post here and the example under the heading, “Minify your JavaScript and CSS files”. I must admit, I only came across this after I had already developed my version; however, there are still reasons to use it. YUI Compressor supports a number of different options and to include a sub-set of these using apply would start to get ugly and (possibly) cumbersome. My Ant task is a bit neater. Well, I think so anyway.

Body Recomposition Experiment

The other day my copy of The 4 Hour Body arrived. I read the author’s other book – The 4 Hour Work Week – a while ago so I thought this would be fun to read, especially as I recently joined a gym after about a 1.5 years of doing little in terms of exercise. Years ago I used to do a lot of weight lifting so I have always been interested in the science of exercise despite my recent hiatus. As I now have the time I thought I would try out some of the experiments in the book for myself.

First experiment: The Slow-Carb Diet or, as the book claims, “How to Lose 1.4 stone (9kg) in 30 Days Without Exercise!” I am going to start this tomorrow (Tuesday). Basically, none of the following are allowed: pasta, potatoes, bread, rice, cereal, milk, fruit etc, except for one day a week when you can eat whatever you like and have as much of it as you want. Sounds good to me!

Here’s an example of some of the meals I will be having:

  • Breakfast: Eggs, lentils, spinach
  • Lunch: Chicken, black beans, mixed veg, guacamole
  • Dinner: Fish, borlotti beans, more spinach, mixed veg

The only other thing I will be taking is a calcium and magnesium supplement (1 capsule a day); I intend on getting all the potassium I need from the copious amount of spinach I will be eating. I drink about 1.5 litres of water a day anyway so won’t need to change anything there. I intend to continue with my 3 gym sessions a week but I’ll probably keep it fairly light while I’m on this diet.

Judging by some of the body fat example pictures from the book (here and here) I reckon I am probably somewhere between 12-15% body fat. I still need to find a tape measure as I haven’t done any of the measurements, e.g. waist, bicep, thigh etc, as the book instructs you to do (twice!) before starting.

I am going to give it a go for at least two weeks but probably not much more than that. I don’t really need to lose weight but as I intend on going away somewhere warm for a few days soon it won’t hurt to try and get a bit more some definition for the beach 🙂

Update: I’ve added a food diary page to record everything I eat

It’s Quitting Time!

One week ago today (last Friday) I left the DMI project at the BBC. Was time for a change, among other things that I am not going to go into.

So what’s next? No idea. Downtime. Vacation. Catch up on sleep.

I’ve got a backlog of books to read and a few ideas for some projects that I have been meaning to start for a long time now, including a lot more writing.

In one word: Freedom

Creating Thumbnails From A Video With Flex

This technical note shows how to create a simple application in Flex to create thumbnails from a video as it plays. Unfortunately it’s not as simple as just writing the client code to generate a thumbnail from a video! Presumably because of DRM, changes must be made on the server to enable the client to access the raw video data.

What follows are the changes that must be made to both Flash Media Server (FMS) and Wowza Media Server and some sample client code. Restart the server once the changes have been made.

Flash Media Server

This section assumes that the application’s root directory is located at: /opt/adobe/fms/applications/vod/. If you are running FMS on Windows (why?) then the correct location will be wherever you installed it; the relative directory structure will be the same.

Either of the following solutions will work:

1) In the application’s root directory, modify Application.xml and add the following:

<Application>
  <Client>
    <Access>
       <AudioSampleAccess enabled="true">/</AudioSampleAccess>
       <VideoSampleAccess enabled="true">/</VideoSampleAccess>
    </Access>
  </Client>
</Application>

2) Create a file main.asc in the application’s root directory then add the following:

application.onConnect = function (client) {
	client.videoSampleAccess = “/”;
	client.audioSampleAccess = “/”;
	application.acceptConnection(client);
}

Wowza Media Server

This section assumes that the application’s root directory is: /Library/WowzaMediaServer/conf/vod/.
The changes are similar to the ones above for FMS. Again, either of the following will get the job done:

1) Open Application.xml and add the following:

<Client>
  <IdleFrequency>-1</IdleFrequency>
     <Access>
          <StreamReadAccess>*</StreamReadAccess>
	  <StreamWriteAccess>*</StreamWriteAccess>
          <StreamAudioSampleAccess>*</StreamAudioSampleAccess>
	  <StreamVideoSampleAccess>*</StreamVideoSampleAccess>
          <SharedObjectReadAccess>*</SharedObjectReadAccess>
	  <SharedObjectWriteAccess>*</SharedObjectWriteAccess>
     </Access>
</Client>

The important bit is highlighted in bold.

2) Create a custom Wowza module, which means some Java coding. There are instructions on the Wowza site for how to do this. The relevant bit of code looks like this:

public void onConnect(IClient client, RequestFunction function, AMFDataList params) {
    client.setStreamAudioSampleAccess(IClient.AUDIOSAMPLE_ACCESS_ALL);
    client.setStreamVideoSampleAccess(IClient.VIDEOSAMPLE_ACCESS_ALL);
}

Client

Once the changes have been made to the server all you have to do now is write the client bit. Fortunately, here’s one I made earlier:

Unfortunately I don’t have access to a publicly available streaming server so you’ll just have to take my word for it that the example works. I did test it locally first 🙂

The “interesting” bit is the function createThumbnail() in Main.mxml; the rest is just boiler-plate.

And that ladies and gentlemen is how to create thumbnails from a video in Flex. Interestingly enough you can do the same thing using the HTML5 video tag and the canvas API but I’ll leave that for another day.

Order, Order

This post is about how to impose order on instances of Java classes. The context of the post will be centred around reading in some text and then counting the number of times each word appears in the text. The output will then be sorted either alphabetically or in order of the number of times each word appears etc. Links to the source code are provided if you want to take a look.

Some instances of classes have an implicit order – a natural ordering. For example, in Java, String objects are ordered lexicographically; Integer objects are ordered numerically. If you find you are writing a value class whose instances have an obvious natural order, you should consider implementing the Comparable interface:

public interface Comparable {
  int compareTo(T t);
}

The String class is an example of a class that implements Comparable; instances of it are ordered lexicographically. If you choose to implement Comparable, your class will be able to take advantage of many of the generic algorithms and collections available in the Java APIs, assuming you need to of course! The following example reads in some text and counts the instances of each word:

public class WordCountTest {
   public static void main(String[] args) throws Exception {
	WordReader reader = new WordReader(new InputStreamReader(System.in));
	Map wordMap = new TreeMap();
	String word;
	while ((word=reader.readWord()) != null) {
		Integer count = wordMap.containsKey(word) ? wordMap.get(word) : 0;
		wordMap.put(word, ++count);
        }
	// Print out using the natural order of the key
	for (Map.Entry entry : wordMap.entrySet()) {
		System.out.println(entry.getKey()+" : "+entry.getValue());
	}
   }
}

A TreeMap sorts entries based on the natural order of the key, in this case, a string, but what if you want to order your instances in an unnatural order? e.g. ordering integers in decreasing order from largest to smallest. Well, in Java, you would use a Comparator:

public interface Comparator {
    int compare(T o1, T o2);
	boolean equals(Object obj);
}

Comparators also allow you to provide an ordering for objects that don’t have a natural ordering, e.g. classes that don’t implement Comparable. Let’s take a look at an example that uses a comparator – as defined in the Order class – that sorts the words based on the number of times they appear:

public class WordCountTest2 {
   public static void main(String[] args) throws Exception {
	WordReader reader = new WordReader(new InputStreamReader(System.in));
	Map wordMap = new HashMap();
	String word;
	while ((word=reader.readWord()) != null) {
		Integer count = wordMap.containsKey(word) ? wordMap.get(word) : 0;
		wordMap.put(word, ++count);
	}

	// Sort by greatest occurrence first
	Set entries = new TreeSet(Order.INCREASING_COUNT_COMPARATOR);
	for (Map.Entry entry : wordMap.entrySet()) {
		entries.add(new WordCount(entry.getKey(), entry.getValue()));
	}

	// Print
	Iterator result = entries.iterator();
	while (result.hasNext()) {
		WordCount e = result.next();
		System.out.println(e.getWord()+" "+e.getCount());
	}
  }
}

It’s also quite common to instantiate a Comparator as an anonymous class and pass it in to the constructor of a sorted collection.

It’s worth noting that in the compare method in the Comparator used in the example, we first compare the word counts and if they are equal, we then return the result of comparing the value of the words. This is essential because sorted collections in Java use the compareTo method – or in this case, the compare method of the Comparator – in place of equals. What this means is that if we were to compare just the counts and return zero because they are equal – even though the words may be different – the WordCount instance would NOT get added to the (sorted) collection if an instance already exists in the collection with the same count. There’s a whole section in Joshua Bloch’s book – Effective Java (2nd edition) – that discusses this in greater detail (Item 12) for those that are interested.

That’s it really. There’s not much to it but it is worth taking the time to understand the difference(s) between implementing Comparable and creating a custom Comparator. Happy sorting!

By the way, if anybody has any better/alternative ideas for how to implement this, let me know. For example, my initial implementation used a TreeMap but that only allows you to sort on the key and not on the value(s) so wanting to sort in order of greatest number of occurrences won’t work with this particular data structure. In the second example, after constructing the hash table, I then add each entry to a (sorted) set then print out each value. Is there a way of doing this and avoiding the second step? I can’t think of one but maybe I’m missing something 🙂

Reading List

At the beginning of last year (2009) I planned to maintain a list of books that I had read during the course of the year. Of course, true to form, I didn’t! Anyway, this year I plan to do the same thing; however, unlike last year I have actually created a page for the list of books that I have read so far in 2010. I’ll update it as I go along. If you have any suggestions for books that I should read, leave a comment. Happy reading!