Yesod sitemap.xml generation (small size)
Posted on Oct 21, 2016 by Alexej Bondarenko
Yesod provides a simple sitemap.xml mechanism through the yesod-sitemap plugin. It's very good for a scenario in smaller projects (let's say a blog with 10 to 500 blog post entries). The following blog post will show you how to use the plug in.
First of all, let's create our scenario. We will have a (simplified) blog post model which contains all blog contents and a main blog site. We would like to collect all blog posts and add the corresponding links to our sitemap.xml with the publication date as the last modified tag. Furthermore we would like to attach a static URL for the main blog entry page to the sitemap.xml with the last modified tag set to the latest blog entry. And, as the last step we need to add the sitemap location to the robots.txt so any bot can discover it easily. Since we are in the world of type safety (Haskell) we would like to handle all links in a type safe manner so we don't mess up things if we change the links later.
Step 1: Building the skeleton
Let's create a skeleton Handler to see if our invocation of sitemap.xml and robots.txt will work properly. For that we need to add yesod-sitemap (Link) as a new dependency to our project. Afterwards we can create a new Handler like this:
module Handler.Sitemap where import Import import Yesod.Sitemap import qualified Data.Text as T --| Deliver a sitemap.xml to the client (usually GET request to /sitemap.xml) getSitemapR :: Handler TypedContent getSitemapR = do sitemapList  --| Deliver a robots.txt to the client (usually GET request to /robots.txt) getRobotsR :: Handler Text getRobotsR = do return $ T.unlines [ "User-agent: *" ]
Sweet! Nothing special going on here. We render a sitemap with no entries (empty list ) and a very minimal robots.txt. Now we need to add the routes to our config/routes file like this:
/sitemap.xml SitemapR GET /robots.txt RobotsR GET
Hint: You'll need to add the Handler to Application.hs so our functions getSitemapR and getRobotsR can be discovered.
To test our new routes we start our yesod application, open up a browser window and try to call /sitemap.xml and /robots.txt. You should see very simple responses popping up.
Step 2: Generating links
Let's fetch some data and create links. I will use Esqueleto but you can use any SQL fetching function you like. What we will get as a result is a Persistent Entity with its Key and Data. For readability we will create a function which will transform a data model into a URL (you can shortcut this if you like):
--| Create a blog post Url blogPostResource :: BlogPost -> Route App blogPostResource blogPost = BlogPostR (blogPostSlug blogPost)
Here the slug of the post is a part of the blog post URL. With this simple function (sidenote: it can get much more complicated if you add date and category to the blog post URL) we can now create an entry for our sitemap.xml:
--| Convert a BlogPost into a SitemapUrl blogPostToSitemapUrl :: BlogPost -> SitemapUrl (Route App) blogPostToSitemapUrl blogPost = SitemapUrl (blogPostResource blogPost) (Just $ blogPostPublishedAt blogPost) (Just Monthly) (Just 1.0)
The SitemapUrl takes four parameters, the SitemapUrl, an optional last modified date as UTCTime, an optional change frequency and an optional priority. I have set the change frequency and priority to some random values which you can adjust to your needs. We our now able to fetch a list of blog posts and map them over to a list of SitemapUrls which we can pass to our sitemapList function inside the Handler:
import qualified Database.Esqueleto as E import Database.Esqueleto ((^.)) --| Deliver a sitemap.xml to the client (usually GET request to /sitemap.xml) getSitemapR :: Handler TypedContent getSitemapR = do now <- lift (liftIO getCurrentTime) blogEntries <- runDB $ E.select $ E.from $ \entry -> do E.orderBy [E.desc (entry ^. BlogPostPublishedAt)] E.where_ (entry ^. BlogPostPublishedAt E.<=. E.val now) E.limit 1000 E.offset $ 0 return entry sitemapUrls <- pure $ map (\(Entity _ blogPost) -> blogPostToSitemapUrl blogPost) blogEntries sitemapList sitemapUrls
I have set two additional things to the query: 1. A filter on publication date (we don't wan't to add unpublished blog posts to appear in the sitemap.xml). 2. For demonstration I have set the limit to 1000 blog post entries. If you have more than this amount I suggest to switch to a background job which will create a GZIP compressed sitemap file on disk and let Yesod stream this file to the client.
By calling the /sitemap.xml in the browser you should now see a nicely rendered sitemap with the blog post links inside it.
Step 3: Work on the details
To complete our tutorial we would like to add a "static" link to our blog entry page with the last modified date set to the latest blog post publication date. And - of course - we would like to add the sitemap.xml link to our robots.txt:
--| Generate a last modification date blogModificationDate :: [BlogPost] -> Maybe UTCTime blogModificationDate blogPosts = if length blogPosts > 0 then Just $ blogPostPublishedAt $ unsafeHead blogPosts else Nothing
We create a helper function to extract the first element of a list of blog posts and optionally return the publication date of this blog post. In our Handler function getSitemapR we can now add our blog entry page:
let staticRoutes = [ (SitemapUrl BlogR (blogModificationDate blogEntries') (Just Weekly) (Just 0.9))] sitemapList (sitemapUrls ++ staticRoutes)
To add our sitemap URL to the robots.txt we need to render the SitemapR Route as text. We will use Yesods UrlRender to do so:
--| Render robots.txt getRobotsR :: Handler Text getRobotsR = do ur <- getUrlRender return $ T.unlines [ "Sitemap: " `T.append` ur SitemapR , "User-agent: *" ]
This completes our sitemap.xml tutorial. I hope you could follow along and it helped you to implement your own sitemap.xml to improve the search engine visibility of you project. If you have any comments, thoughts or questions feel free to use the section below.