Monthly Archives: June 2007

Tagged PDF

Alfresco uses Open Office to convert documents to PDF but by default it doesn’t generate tagged PDF. This note describes how to configure Alfresco so that it does produce tagged PDF.

So what is a tagged PDF? Well, it’s a PDF that contains structural information about the content, e.g reading order, the presence of tables etc. This allows screen-readers to read the PDF document – it makes the PDF accessible. In order to get the most out of the conversion process, as much structural information as possible needs to be present in the original document. I came across these recently when doing some work for a local authority that is using Alfresco.

So how do you configure Alfresco to produce tagged PDF? Open up the file ‘openoffice-document-formats.xml’, which is located in <tomcat_home>/webapps/alfresco/WEB-INF/classes/alfresco/mimetype/, locate the Portable Document Format document format section (it should be at the top of the file) and modify it so it looks like this:


<document-format><name>Portable Document Format</name>
<mime-type>application/pdf</mime-type>
<file-extension>pdf</file-extension>
<export-filters>
 <entry><family>Presentation</family><string>impress_pdf_Export</string></entry>
 <entry><family>Spreadsheet</family><string>calc_pdf_Export</string></entry>
 <entry><family>Text</family><string>writer_pdf_Export</string></entry>
</export-filters>
<export-options>
  <entry><string>EnableTextAccessForAccessibilityTools</string><boolean>true</boolean></entry>
  <entry><string>UseTaggedPDF</string><boolean>true</boolean></entry>
</export-options>
</document-format>

Restart Alfresco. That’s it! The next time you convert a document to PDF it should be tagged. You can test that the conversion worked (on the Mac) by using Adobe Reader 8.0. Open up the PDF file. Go to Document -> Security -> Show Security Properties. Click on the ‘Description’ tag. The ‘Tagged PDF’ entry should be set to ‘Yes’ if the conversion worked correctly. You can also check the document for accessibility by clicking on Document -> Accessibility Quick Check.

You can download the modified configuration file here.

Thursday 7th June 2007

BBC News: “But they all maintain their innocence and lawyers observing the trial have suggested acquittals may be more likely than convictions because while turning Swissair’s $3bn cash balance into multi-million dollar debts in just a few years may be very stupid, it isn’t necessarily a crime.”

Converting to PDF with Alfresco

Brief note that explains how to set-up Alfreso to transform various document formats to PDF. I am writing this because I didn’t find it particularly easy to track down how to do it; it involves a lot of digging around in the wiki. Hopefully, this note will make it easier to set-up.

By default Alfresco will convert PDF, Word documents etc to plain text but if you want to go the other around, e.g. transform plain text to PDF, you need to start Open Office from the command line – a version of Open Office comes bundled with Alfresco. It’s easy really: just run the start_oo.sh script, which you will find in the Alfresco home directory. Once you have done this you should be able to transform documents to PDF.

You can check that Open Office is running by executing the command lsof -i | grep 8100. If you start Open Office manually, note that Alfresco expects Open Office to be running on port 8100.

There’s a whole page on starting Open Office from the command line on the Alfresco wiki. Not sure why the page needs to be so long. I just ran the start_oo.sh script and Open Office started up without any problems. Maybe I got lucky!

Tuesday 5th June 2007

This just has to be seen: Darth Vader Rap.

Streaming vs Downloads

I have been thinking a lot recently about methods of delivering large media files, e.g. video, mp3 etc, and playing them online. Think YouTube. You can either stream the file or use HTTP (progressive) download. My question is this: why would you want to stream a file? It seems to me that, with a few exceptional cases, streaming files is unnecessary. I am trying to come up with a list of pros and cons for streaming (large) media files.

Pros

  • Can be used to broadcast time-sensitive events, e.g. a live concert.
  • Useful for devices with limited storage capacity, e.g. mobile phones
  • As the content is not stored locally, makes “recording” content difficult, e.g. DRM – I have seen this one cited in several places
Cons

  • Requires “special” server-side software to implement the streaming protocol, e.g. RTMP. Can be expensive.
  • Some streaming protocols, such as Flash video, are proprietary (related to the above point)
  • More (server) hardware required to support connections etc ??

Even though I have only come up with a few negatives I am not convinced that streaming files is necessary in the majority of cases. Podcasts are a good example where just downloading the file is sufficient. Thoughts? Feel free to add any comments.

Update: seems there are stream recorders out there that do allow you to save content that is streamed.