Pandoc: Convert Markdown to HTML, or anything to anything else

Looking round for something to do automatic conversion of Markdown to HTML, I came across a utility called Pandoc, written by John MacFarlane, from Berkeley. As well as converting Markdown to HTML, you can use it to generate word documents, pdf and ebooks.

YamlDom: a Document Object Model for Yaml

A couple of months ago I started playing around with using Yaml  instead of XML to write configuration files. I like the way you can define both  simple name-value property files and complex hierarchies of data, all with the same data format.

So, I’m convinced by the format. But when I actually started trying to parse a couple of different config files, I found it a bit awkward, especially with yaml files more than a couple of levels deep.

I’m using the yamlbeans parser by Nathan Sweet, which is fine. It does everything it’s supposed to do: you can bind Yaml elements to Java classes; and if just want to deal with the raw data, you can read through at as nested ArrayLists and HashMaps.

However, because I’m using lots of configuration files with arbitrary structure, I didn’t want to use the Yaml-Java binding support, and I didn’t enjoy trying to read through the nested ArrayLists and HashMaps to get my data.

I had a think about it and decided that what I needed was a Document Object Model (DOM) for Yaml. With a DOM I could recurse through the document and treat everything I came across as the same kind of thing; I wouldn’t have to keep switching from handling an ArrayList at one level, then a HashMap at the next, then a HashMap, then another ArrayList.

So I went away and designed a model based on the Yaml spec (see below), then wrote the Java classes as a reusable library which I’ve just made available on bitbucket, here . Help yourself to the code if you think you might find it useful :-).

Ant or Maven? Don’t bother with either – use Gradle instead.

I’ve been making some useful discoveries the last few weeks.

I’m trying to decide on the best technology stack for my own work, and I’ve now decided on the build tool (Gradle) and the version control system (Mercurial). I’ll explain my reasons for choosing Mercurial in a future blog. In this one I’ll explain why I’ve chosen Gradle over Ant or Maven.

Some people swear by Maven. Its approach of convention-over-configuration makes setting up a build and deploy really straightforward. But if you want to do something slightly out-of-the-ordinary it’s like having your fingernails removed.

Ant, on the other hand, is highly versatile, but every build target needs to be defined explicitly, just as it would be in an old-fashioned makefile.

Gradle gives you the best of both worlds. It takes the convention-over-configuration approach of Maven, but instead of trying to force every project into the same mould, it allows you to reconfigure the mould according to your own requirements.

Here, for example, is the complete Gradle build file you need to build a Java project and package all the classes and resources into a Jar file (by default Gradle expects the same folder structure as a Maven build):

apply plugin: 'java'

Yes, that’s it. You put that in a file called build.gradle and run gradle build at the command line, and your build is done.

What if you want to write some tests, using JUnit, and you want to use the JUnit jar stored in a Maven repository? Well, just add a reference to the Maven repository and define the dependency on the JUnit jar:

 repositories {
    mavenCentral()
 }

 dependencies {
    testCompile group: 'junit', name: 'junit', version: '4.+'
 }

Maybe you’ve also got some local jar files your build is dependent on.  Do you have to pull your hair out to get it to work, as you do with Maven? No, you just need to add another dependency like the following:

 dependencies {
    compile fileTree(dir: 'lib', include: '*.jar')
 }

I’ve got to the point now where I’m generating an eclipse project, building a jar, creating javadoc, creating a binary distribution (complete with Gradle-generated Windows batch file and Unix shell script), all with a Gradle build file just 26 lines long!

Next up is a multi-project build :-).

Flying to the moon on 70K of memory

Fancy playing astronaut and flying to the moon on an Apollo mission? Well, the people at the Virtual Apollo Guidance Computer (AGC) site have recovered the original operating systems and software from the Apollo missions and built a virtual operating device that you can download and play with!

Even if you don’t want to play, reading the introductory page is a real education. Apparently they flew to the moon using a computer with 70K of ROM, 4K of RAM, and a CPU speed of 85K instructions per second (a standard laptop these days has a CPU speed of around 1 billion instructions per second).

Were the founders of Google influenced by Enid Blyton?

Reading The Magic Faraway Tree to my daughters last night, I came across the following bit of dialogue:

‘Well, come back and have tea with us,’ said Moon-Face. ‘Silky’s got some Pop Cakes – and I’ve made some Google Buns.’
The book was published in 1943, the same year that Colossus was demonstrated at Bletchley Park in the UK, and work started on ENIAC in the US. (A Google Bun, by the way, as readers of the book will find out, is a bun with a large raisin in the centre filled with sherbet :-)).

A Turing Machine in sed

For those self-taught programmers like myself who nod sagely when the term ‘Turing Machine’ comes up, without any idea what on earth it means, here’s a lovely practical explanation based on an implementation in sed.

The sed script referenced in the paper can be found here.

Continuous Integration vs Feature Branching

Recently I’ve been working with large financial organisations, using unfamiliar version control systems (Perforce and Git) and equally unfamiliar build processes.

In both organisations the Main code line (in Perforce), or master repository (in Git) was sacrosanct, and developers committed changes to a feature branch. A continuous integration server ran builds off these feature branches, and only when this branch build got the green light were you allowed to add yourself to the merge queue for the master or main branch.

In both cases, the process of getting changes from your working directory into the main repository was very lengthy. In the Git case it could be several weeks before your changes got merged into the master repository, and, as you can imagine, the merges into master were fraught with pain.

I got to thinking this morning how different the process is that I’m used to, where the version control system consists of just a main code line and release branches, and developers commit changes directly either to main or to release branches, and continuous integration builds are builds of these releasable branches or of the main (usually development) branch.

Well, it appears that I’m not alone in disliking the feature branch approach. Here’s a link to an article by Martin Fowler discussing the dangers of feature branching, and here’s a riposte from James McKay.

As you can imagine, I’m with Martin Fowler on this one :-).

What’s curious for me though, is that feature branching is common both with enterprise development, presumably because of the large number of developers involved in projects, and with open source development, where developers are using distributed version control systems like Git. I guess in the latter case it’s because working in isolation is encouraged by the distributed nature of the system.

The death of Synchronised

The death of Synchronised in the Grand National on Saturday was given an added poignancy by his attempt to escape before the race.

Those of us watching live saw the race delayed by 10 minutes after he unseated his rider, Tony McCoy, on the way to the starting line. He headed off down the course in a bid for freedom, only to be recaptured by a photographer and brought back to the start to be ridden to his death.

Pentaho Kettle

I don’t know what it says about me (well I do, but I don’t want to think too hard about it :-)), but my favourite piece of software is a data integration tool called Pentaho Kettle, designed and built by a Belgian developer called Matt Casters.

I first came across it when I was doing some evaluation of data integration tools for a previous employer. It has such a simple, intuitive interface that I was defining data transformations within fifteen minutes of downloading it. Ease of use gave it the edge over the other tools I looked at, even without any deeper investigation of its functionality.

I ended up using it for about six months, and grew to really appreciate the care that had been put into the design. As well as the simple, intuitive user interface provided for defining the data transformations, I really liked the following features:

1. The model used by the architecture is of small self-contained steps, like Unix tasks, that take input data, do one small thing, then pass the output data to another step;

2. Because of this architecture it’s easy to plug in new steps, and Kettle has a lovely mechanism for writing your own plugins and including them in the main tool;

3. As well as the main UI (which, following the culinary theme of the tool, is called Spoon :-)), there are command line tools (called Pan and Kitchen :-)), that allow you to run jobs and transformations according to a schedule, and a web server (called Carte) that allows you to cluster the transformations across multiple machines;

4. And, if all that weren’t enough, there’s also a Java API that allows you to control the transformations from your own Java program, if you’re so inclined.

5. Oh, and there’s an active forum for users. Writing a plugin to import data into our own PIM, I found myself struggling to work out how to do something that was slightly against the grain of the way Kettle works, so I posted a question on the forum. Within a day Matt Casters had responded, giving me exactly the line of code that let me do what I wanted!

simplicity & versatility > 0

The key to aesthetics in technology. Simple enough for a 2 year old to use, versatile enough for an artist to draw with.

Pencil or iPad?