Monthly Archives: June 2009

Flushing The Document Early

This post is a note to myself and regards the Transfer-Encoding header field, defined in the HTTP spec. I was reminded of its use while reading Even Faster Web Sites.

First, some assumptions for the example that follows. Let’s say that the header of your page contains some annoying Flash banner advert that is downloaded from a different host and that the body of the page takes a few seconds to generate – in the example below I suspend the current thread for 10 seconds.

The page will be generated on the server and then served back to the browser. The browser will then parse the HTML and proceed to fetch the banner ad etc. In the meantime the user will be sitting there wondering what, if anything, is going on! So how can we present a more user-friendly page? Something that feels more responsive to the user. Enter the Transfer-Encoding header. By setting it to chunked we can serve the header part of the page – the first chunk – to the browser while the server works on generating the body of the page; in other words we don’t need to generate the page all in one go and then serve it up. The Transfer-Encoding header informs the browser that the content for the current page is going to come down the pipe in pieces (“chunks”) and not all at once. It also has the added benefit that as soon as the browser retrieves the first part of the page (the “header”) it can start to download the banner ad in parallel* while it waits for the remaining part of the page from the server. Overall the page should feel more responsive.

So, how to do it in a Java servlet. You would think you would just call the setHeader method on the servlet response object but you don’t – what were you thinking? Turns out it’s even easier than that! An example is given below:


void doGet(HttpServletRequest request,
           HttpServletResponse response)
   throws ServletException, IOException {

	response.setContentType("text/plain");
	PrintWriter out = response.getWriter();
	out.println("The start of the page.");
	out.flush();
	// Wait 10 seconds for no reason whatsoever
	try {
	     Thread.currentThread().sleep(10000);
	} catch (InterruptedException e) {
	     // Do nothing
	}
	out.println("The rest of the page ...");
}

Basically, if you write something to the output buffer and then call flush() that will automatically set the Transfer-Encoding header for you. If you remove the call to flush() then all the output will be buffered before it is sent back to the browser. If you fire up the example code in Tomcat (or something) and then look at it in a browser, the first line will be returned immediately followed 10 seconds later by the rest of the page; if you remove the call to flush() then the page content will be returned all at the same time after about 10 seconds or so.

End of note.

(*) Most browsers open up a limited number of connections to a given host. For example, Firefox 3 opens up a maximum of 6 connections at any one time to a given host.

URL Shorteners

Yes, another post about url shorteners. Recently, apart from complaining about them, I have been thinking about how URL shortening services work; services such as bit.ly and tinyurl.

Many of these services reduce a URL to a small string; typically a length of 3 to 6 characters. As a result, they can’t be simply hashing the URL. For example, if you use MD5 to hash the domain name of this site you get (in hexadecimal): 4302e8ae08795f0c67c932338f516e2f. The resulting hash value is longer than the URL itself! Not very useful for a URL shortening service.

So how do they work? I still don’t know but here’s one approach that I took:

Let’s say we want to produce a code using characters from the following alphabet [a-zA-Z0-9]; that gives a total of 62 different alphanumeric characters. For a 5 character code there are 62*62*62*62*62 (=916,132,832) possible combinations. If we associate each code with a given URL – a simple one to one mapping – then that’s a lot of URLs! The key point is that, unlike a hash function, I don’t think the URL is used as input to determine what comes out of the other end; a “random” character code is generated and then is just stored with the URL so it can be retrieved with a simple table look-up.

I came up with a probabilistic approach to generating these “hash” codes. I say probabilistic as it just generates a code at random. If there is a collision, it just tries again and generates another one. So how likely are collisions to occur? Well according to the birthday paradox we should expect to see a collision after generating 2n/2 items, or approx. every 215 items for a 5 character code using the example code, assuming, of course, that all generated codes are equally likely to occur. It’s an example, it will do!

You can look at the source code here. In the example I use a bit vector to record what codes have already been generated; the bit vector is limited to representing 231-1 different values therefore the example code is restricted to generating a maximum of 5 character codes; each character requires 6 bits. I’ll let you do the math as I have already done it 🙂

If you do run it you may have to increase the maximum heap size, e.g. -Xmx256m. I ran out of the heap space the first time I ran it using Eclipse!

It’s a first attempt so there is likely some room for improvement but it’s a start. Would be great to hear any alternative thoughts on how these things work.

Alternative App Store?

I am about to embark* on developing an application for the iPhone, of which I am going to sell thousands of copies and then retire to the Caribbean (just like all iPhone apps right?) but I have a few concerns, primarily with Apple’s app store policy about what qualifies (and disqualifies) an application from being sold – or given away – on the app store. Now I don’t know much about Apple’s policy but I have followed the various discussions about it in the “media” so I know about things such as how if your application is deemed to compete in some way with Apple then your app will be rejected etc but it doesn’t help when I keep reading articles like this. From a business perspective I don’t want to spend months developing something only to have Apple turn around and reject it!

This neatly segways into my next point: Why isn’t there an independent store selling applications for the iPhone? A place where all the misfit applications rejected by Apple can be sold on – think of it as a council estate for mobile phone apps. Maybe there is one; I have no idea. I can see why developers would want to sell their apps through the App store as it’s baked right into iTunes and it’s easy to pay for and put on your iPhone etc but surely there must be some other way to manage applications? I guess I’ll find out soon enough but if you have any useful advice in the meantime, please leave a comment.

* when I say “about to embark” that’s actually dependent on dragging myself away from my keyboard and down to the Apple store to buy myself a new Macbook Pro; mine is getting a bit long in the tooth. As they have just released the latest versions and lowered the price I guess I don’t have any excuse not to get one.

The Trouble With URL Shorteners

I have just finished reading this post about URL shortening services and it got me thinking.

I use URL shorteners on the odd occasion but I have a problem with them. Answer the following simple question: What is the destination of the following links (and no peeking by clicking on them first):

  1. http://bit.ly/guNtb
  2. http://www.simonbuckle.com

Hopefully this highlights the problem: You don’t know where you are going to when you click on the links provided by these URL shortening services! This seems to me to be an area ripe for Internet scams (especially if you use Internet Explorer); I am thinking links to porn sites, links that download the latest malware on to your PC etc; there are endless possibilities!

What I would like to see is some kludge so that when you hover over one of these shortened URLs you can see the destination of the link. Sure, not all URLs indicate what exactly awaits you at the other end of it, and, in the case of Twitter, if it is someone that you are following then you can be fairly confident that they aren’t going to send you somewhere you really don’t want to go (or maybe you do). Still, there is definitely room for improvement.

New Host

By now you should be reading this on my new host, assuming the DNS changes have propagated. I finally got around to moving this blog from Joyent – the crap and (relatively) expensive hosting company, formerly known as TextDrive – to GoDaddy. It’s even running the latest version of bloatware for blogging!

I had been meaning to migrate for a while as I had been having problems with uptime, among other things. Ultimately, I had no choice but to migrate as a few weeks ago I tried to login using the password I always use but for some reason it was complaining about an incorrect password blah blah (and no, I wasn’t using the wrong password). Anyway, I reset my password and used the one WordPress had generated for me but that too, for some still-unknown reason, wouldn’t work either. I was locked out of my own blog! What to do. Well, after some careful exporting of some database tables but not others, I now have a working blog again hosted elsewhere for half the cost! Migration was fairly painless, apart from the database munging. Was impressed with Disqus; migrating the comments just worked!

When I first started this site I decided that it was imperative that I have access to all of my data. I could have chosen some free-to-use blogging service but many of these services don’t give you access to your data; well, they certainly didn’t at the time anyway. It’s just as well I did otherwise I could well have ended up permanently locked out of my own site!