Streaming Dynamic Sitemaps

Chapter 1 - HTML and SEO

Problem

You would like to stream a XML sitemap created on-the-fly (dynamically) on request.

Solution

We can use the yesod-sitemap plugin to create typesafe sitemap entries with typesafe urls. To do so, we add the library https://hackage.haskell.org/package/yesod-sitemap as a dependency to the project. Next, we can create a new handler ans use the new library to provide an empty sitemapList (we will add data afterwards):

module Handler.Sitemap where

import Import
import Yesod.Sitemap
import qualified Data.Text as T


--| Deliver a sitemap.xml to the client (usually GET request to /sitemap.xml)
getSitemapR :: Handler TypedContent
getSitemapR = do
  sitemapList []

So far, we can not call the sitemap handler since there is no route for this. Hence, we need to add a new entry to our routes, like this:

/sitemap.xml SitemapR GET

To fill the sitemap, we can use the Type SitemapUrl. To construct this kind of data we need just a valid url (type safe). The publishing date, publication frequency and priority are optional. In the following example we transform a blogpost entry to a valid SitemapUrl:

--| Convert a BlogPost into a SitemapUrl
blogPostToSitemapUrl :: BlogPost -> SitemapUrl (Route App)
blogPostToSitemapUrl blogPost =
  SitemapUrl (blogPostResource blogPost) (Just $ blogPostPublishedAt blogPost) (Just Monthly) (Just 1.0)

The SitemapUrls can now be added to the sitemapList. Our Handler is intelligent enough to figure out how to serve the sitemapList (as valid XML).

Discussion

  • Unfortunately the SitemapUrl does not support query parameters which can be added to our urls. Sometimes this can be useful, for instance if you would like to pass different pages in a listing (pagination).
  • It is important to understand that any valid Url, which can be constructed in a hamlet template, can as well be passed to our sitemap. Hence, we can add static urls easily.
  • Ths technique can be fragile if we need to serve a big amount of data (millions of entries). in this case it is recommended to write the generated sitemap to a file and stream this file on each request instead of generating the data for each request.