Markdown syntax highlighting in Scala

by|inArticles||3 min read
Scala Language
Scala Language

Even though WordPress and other CMS systems are around, you may want/need to implement your own Markdown parsing and syntax highlighting. Markdown is a well-defined format to write (technical) documentation. With highlight.js a JavaScript library for syntax highlighting is already available for more than 10 year. Nevertheless I spent a couple of hours to figure out how to implement a full support in Scala.

I tried the built-in MarkdownParser, which works pretty well for simple Markdown:

def renderContent(xhtml: NodeSeq): NodeSeq = {
  val markdownContentAsString = SomeService.load()
  MarkdownParser.parse(markdownContentAsString).getOrElse(NodeSeq.Empty)
}

This small snippet illustrates two parts. Markdown is loaded as a String and passed directly to the parser. The MarkdownParser will produce a Box (Option with build-in error handling). If parsing is successful, we return the generated NodeSeq or an empty node (Side note: Instead we could pattern match the Box and implement a proper error handling for a better user experience).

This solution works very well as long as you stick to simple code samples without syntax highlighting. Each code block in your markdown will be rendered as a simple HTML:

<code>
  Your code here
</code>

Of course you can use highlight.js and let the library auto detect the language for syntax highlighting. But in my case I wanted to define the language myself. As far as I have seen there is no option for Lifts MarkdownParser to do this. This is why I had to search for another library.

txtmark

I quickly found out that txtmark is a good solution for this kind of problem. This library allows you to pass in a custom CodeBlockEmitter which would render the language name into the code tag. Excellent! Unfortunately the documentation is not very good at this point.

To replace the MarkownParser and use txtmark, you need to update your dependency:

libraryDependencies ++= Seq(
  "es.nitaur.markdown" % "txtmark" % "0.16"
)

In the next step we need to enable code block rendering by building a new Configuration (I will include import so you know what Configuration to use):

def renderContent(xhtml: NodeSeq): NodeSeq = {

  import com.github.rjeschke.txtmark.Configuration

  val config = Configuration.builder()
    .forceExtentedProfile()
    .build()

  // Todo: Add parser
}

Now we are able to use this Configuration to parse our markdown content. What we will get as a result is HTML as String, which we need to transform to a NodeSeq (Sequence of XML nodes) by using XhtmlParser

def renderContent(xhtml: NodeSeq): NodeSeq = {

  import com.github.rjeschke.txtmark.Configuration

  val config = Configuration.builder()
    .forceExtentedProfile()
    .build()

  val markdownAsString = SomeService.load()
  val htmlAsString = Processor.process(markdownAsString, config)
  
  val parser = new XhtmlParser(IOSource.fromString(htmlAsString))
  
  parser.initialize.document()
}

At this point we will get the same result as our very first Lift MarkdownParser example. It's time to add syntax highlighting support.

Therefore we will include the markdown4j library (which contains a proper CodeBlockEmitter implementation) and add the Emitter into our configuration (Side note: You can use the Configuration to add plug-ins as well). Important: Markdown4j uses txtmark as a dependency. To avoid conflict you should remove txtmark dependency we added before:

libraryDependencies ++= Seq(
  "org.commonjava.googlecode.markdown4j" % "markdown4j" % "2.2-cj-1.1"
)

To complete our code:

def renderContent(xhtml: NodeSeq): NodeSeq = {

  import org.markdown4j.CodeBlockEmitter
  
  val config = Configuration.builder()
    .forceExtentedProfile()
    .setCodeBlockEmitter(new CodeBlockEmitter())
    .build()

  val markdownAsString = SomeService.load()
  val htmlAsString = Processor.process(markdownAsString, config)
  
  val parser = new XhtmlParser(IOSource.fromString(htmlAsString))
  
  parser.initialize.document()
}

This will do the trick! Your HTML output should now look like this (scala is just an example):

<pre>
    <code lang="scala">
      Your code here
    </code>
</pre>

Of course, this code is not magically highlighted. To do this you can now easily add highlight.js to your pages and let this fantastic library do the work.

Thank you for reading this far! Let’s connect. You can @ me on X (@debilofant) with comments, or feel free to follow. Please like/share this article so that it reaches others as well.

© Copyright 2024 - ersocon.net - All rights reservedVer. 415