The Agony And Ecstasy Of syntaxhighligher; Or, Why You Shouldn’t Fix What Ain’t Broke

WordPress just keeps getting better.

Release 2.6 includes a several new features, such as wiki-style version control for posts, a toolbar-based “blog this” control, support for Google Gears (which promises to be a major breakthrough in Web development), native word counts, native image captioning, SSL support, and bubble notification when a plug-in has been upgraded. Plus a bunch of other features and bug fixes.

While I was reading the WordPress blog entry about Version 2.6, I noticed that they have a really, really cool syntax highlighting script: Not only is it clear, and not only does it do an excellent job of segmenting the various aspects of code, it included plain-text view and copy-to-clipboard links, right in the header.

So I decided to find out what it was and use it. And, simply enough, the plug-in is called “Google Syntax Highligher For WordPress” and it leverages the JavaScript-based syntaxhighligher hosted at Google Code.

Now, I had previously been using WP-Syntax, which is based on GeSHi and is based entirely server-side, and had been happy with it. But the syntaxhighlighter code looked cleaner and offered amenities, so I decided to make the change.

And in all honesty, I wish I had stuck with WP-Syntax. But once I made the change, it was way too late to go back.

The way that WP-Syntax and syntaxhighlighter handle code markup is the same: via the <pre> tag. You can also use the <textarea> tag with syntaxhighligher.

While WP-Syntax supports more languages, thanks to being GeSHi-based, syntaxhighlighter supports enough modern languages so you can get by, especially since it doesn’t label the code in any way.

There are two important differences between WP-Syntax and syntaxhighlighter. The minor one is that while WP-Syntax supports changing the starting line number of a code block, via the line attribute, all syntaxhighlighter blocks begin at Line 1.

The most important difference between the two is how they handle HTML entities.

Basically, syntaxhighligher does not handle HTML opening and closing brackets (< and >) properly within a <pre> tag, unless you convert those brackets to their HTML entities.

This isn’t an issue if you use <textarea> instead of <pre> in syntaxhighlighter. And it isn’t an issue for WP-Syntax because again, it replaces HTML entities for you as it works with your entry on the server.

Therefore, I had already delimited all my code blocks in this blog with <pre>, and I had a few <textarea> tags floating around in those code blocks, which made me reluctant to change from <pre> to <textarea>.

So I ran a few MySQL queries to modify my previous markup blocks:

  • Each <pre> block in syntaxhighlighter must have the name=”code” attribute; in WP-Syntax, if a <pre> tag has a lang attribute, then it is processed as a code block.
  • Each language markup is treated as a class by syntaxhighlighter; in WP-Syntax, CSS styles are applied based on the lang attribute.

So basically, each lang attribute needed to be changed to a class attribute; and all <pre> tags needed to have name=”code” added to them.

UPDATE wp_posts SET post_content = REPLACE(post_content, '<pre', '<pre name="code"');
UPDATE wp_posts SET post_content = REPLACE(post_content, 'lang="', 'class="');

That was the easy part. The hard part: I had scores of opening and closing HTML brackets that were not converted to HTML entities. I didn’t need them to be converted under WP-Syntax, so I was lazy and didn’t convert them.

(This was especially aggravated by the fact that prior to Version 2.5, WordPress’s WYSIWYG editor couldn’t properly handle HTML entities; so, you had to turn off the visual editor and use the HTML editor if you were going to code. TinyMCE removed that issue in Version 2.5, and it took me a while, even after that, to trust the WYSIWYG editor enough to use it. And even now, if you are going to use a syntax highlighter of any sort, you need to switch between the “visual” and HTML editors.)

I spent about four hours over the last two days going through all 87 (at the time) Programming entries, copying all the code blocks, pasting them into Notepad++ and replacing all opening and closing brackets / ampersands with their HTML entities.

I am sure I missed some. I am sure there are entries in this blog that have code blocks but that, for whatever reason, I have not added to the Programming category, and thus have not properly formatted for syntaxhighlighter. If you do come across a messed-up entry, please do me a favor and comment on it so I can fix it. Thanks.

In any event, while WP-Syntax wasn’t perfect, it did work; and now, I’ve invested a lot of work for what I would consider a minimal gain.

I wish I had stuck with WP-Syntax, but again, given all the work that went into the conversion, I’m in for a pound now.

Related Posts
  1. In Praise of iG:Syntax Hiliter (6)
  2. Yes, Things Look Different, Because They Are Different (5.9)
  3. Dynamically Creating Links With JavaScript (5)

The numbers inside parentheses are relevance scores. Scoring is based, in order of priority, on title, category, content and tags. The higher the score, the more likely that post relates to this post.

One Comment

  1. [...] I decided that I was wrong, going back to WP-Syntax wouldn’t be too agonizing — at least, whatever work went into [...]