Adding Pandoc to Gollum

The nice thing about Markdown is that it’s a very lightweight markup language which lets you write files in any text editor which look pretty normal when read as plain text, because they use mostly the kind of idioms which people commonly use in text documents anyway, but you can run them through a simple converter to generate HTML or PDF output with all kinds of fancy formatting.

The not-so-nice thing about Markdown is that approximately seven seconds after its original creation, it split into a kazillion subtly different flavours, variants and dialects. There’s Original Markdown, PHP-Markdown (a.k.a. Markdown Extra), GitHub Flavored Markdown, Pandoc Markdown, and a whole bunch of others. So whenever you move from one tool to another, you have to learn and unlearn a few tricks, and fix some subtle breakage in any content files you try to take with you..

So, if you have free choice to pick any of the many command-line tools for converting Markdown to, say, HTML, which one should you choose? That one’s easy: Pandoc. It’s amazing. It can convert to and from pretty much every markup format ever invented. Its own default flavour of the Markdown syntax is incredibly complete (you can literally write books in it), and pretty much a superset of all of the others, but you can selectively disable features, or even put it into ‘strict’ mode where it will behave almost exactly like John Gruber’s original version.

The main disadvantage of Pandoc is that it’s addictive: after writing some documents in Pandoc Markdown, going back to one of the more limited variants feels like going back to a Punto after zipping around in a Ferrari for a while. This commonly happens when you are using a blog or wiki which lets users write pages in Markdown.

The obvious solution, if you happen to be the maintainer of that blog or wiki site, is to configure it to use Pandoc instead of some lesser Markdown-to-HTML converter. (If you’re not, you will just need to suck it up, I’m afraid.)

Two popular systems which use Markdown, are Gollum (a wiki using Git repositories as its back-end) and Jekyll/Octopress (Jekyll is a static website generator; Octopress is a blogging engine built on top of it).

For Jekyll/Octopress, there already exists a plugin to let you use Pandoc as the converter tool when regenerating your site. (See here for how to use the plugin with Octopress.)

For Gollum, I couldn’t immediately find a ready-made solution, however. Fortunately it turns out to be pretty straightforward, although to be honest I am not completely sure to what extent the method I’m using here, is to be considered a public interface which can be relied on to keep working in future versions of github-markup (I verified it for version 1.1.2 and 1.2.1).

Here’s how it works: if you look at the code for GitHub::Markup, you see that it registers a bunch of handlers for the different markup types it supports, each identified via a regular expression on the filename. So we just need to add our own handler to that. However, since we’re aiming to override the default handler for Markdown files, rather than add a new handler for a not-yet-supported extension, we need to make sure to add ours to the start of the markups array.

To cut a long story short, just add the following to the config.rb file you’re using to configure your Gollum instance:

1
2
3
4
5
6
7
8
require "github/markup"

ci = GitHub::Markup::CommandImplementation.new(
    /md|mkdn?|mdwn|mdown|markdown|litcoffee/,
      "pandoc -f markdown-tex_math_dollars-raw_tex")
# Our command needs to go to the front of the queue, in order to take
# precedence before the stock GitHub::Markup::Markdown implementation
GitHub::Markup::markups.unshift(ci)

Note that we disable Pandoc’s support for Mathjax and TeX, to avoid conflicts with Gollum’s own Mathjax support. You could also do it the other way around, disable Gollum’s Mathjax and let Pandoc handle it; haven’t tried that yet.

That’s all! (Of course, it also helps if Pandoc is installed on your system in the first place.) Now, if you run gollum --config config.rb (add additional command-line arguments to taste) your pages should be gloriously rendered by Pandoc, giving you full access to lots of cool extra features not supported by the standard GitHub-flavored Markdown which Gollum uses by default.

Note that this assumes you want to replace the default Markdown handler with Pandoc. If you want to introduce support for a new markup handler, you’ll want to do it differently: register your handler next to the existing ones instead of unshifting the markups array, use a different file extension obviously, and you’ll also need to make a few small changes in different places in gollum and gollum-lib to add your language to the list of supported options. I’d recommend to search for ‘creole’ and just do a bit of copy-n-modify.

One last thing: while doing this I noticed that every time you fetch a page, Gollum actually calls the handler three times, with different callstacks! That’s not very efficient.. This could be optimized, if necessary, by implementing a small cache, but that would be better handled inside gollum-lib itself, rather than in each separate handler. That could be a nice little hobby project for anyone with an afternoon off, but I’m done for now..