253 lines
9.1 KiB
Markdown
253 lines
9.1 KiB
Markdown
---
|
|
title: Generating incremental static site generators in Haskell using cartesian categories
|
|
date: 2022-12-06
|
|
draft: true
|
|
toc: true
|
|
---
|
|
|
|
A few days ago, I released the new version of [achille], a Haskell library
|
|
providing an EDSL for writing static site generators. This embedded language produces
|
|
efficient, *incremental* and *parallel* static site generators, *for free*.
|
|
|
|
[achille]: /projects/achille
|
|
|
|
In this post, I will explain how [achille] is able to tranform this intuitive, "readable"
|
|
syntax into an incremental static site generator:
|
|
|
|
```haskell
|
|
import Achille as A
|
|
|
|
main :: IO ()
|
|
main = achille $ task A.do
|
|
-- render every article in `posts/`
|
|
-- and gather all metadata
|
|
posts <-
|
|
match "posts/*.md" \src -> A.do
|
|
(meta, content) <- processPandocMeta src
|
|
writeFile (src -<.> ".html") (renderPost meta content)
|
|
meta
|
|
|
|
-- render index page with the 10 most recent articles
|
|
renderIndex (take 10 (sort posts))
|
|
```
|
|
|
|
|
|
Importantly, I want to emphasize that *you* --- the library user --- neither
|
|
have to care about or understand the internals of [achille] in order to use it.
|
|
You are free to ignore this post and directly go through the [user
|
|
manual][manual] to get started!
|
|
|
|
[manual]: /projects/achille/
|
|
|
|
This post is just there to document how the right theoretical framework was key
|
|
in providing a good user interface that preserves all the desired properties.
|
|
|
|
---
|
|
|
|
## Foreword
|
|
|
|
The original postulate is that *static sites are good*. Of course not for every
|
|
use case, but for single-user, small-scale websites, it is a very practical way
|
|
of managing content. Very easy to edit offline, very easy to deploy. All in all
|
|
very nice.
|
|
|
|
There are lots of static site generators readily available. However each and
|
|
every one of them has a very specific idea of how you *should* manage your
|
|
content. For simple websites --- i.e weblogs --- they are great, but as soon as
|
|
you want to heavily customize the building process of your site, require more
|
|
fancy transformations, and thus step outside of the supported feature set of
|
|
your site generator of choice, you're in for a lot of trouble.
|
|
|
|
For this reason, many people end up not using existing static site generators,
|
|
and instead prefer to write their own. Depending on the language you use, it is
|
|
fairly straightforward to write a little static site generator doing everything
|
|
you want. Sadly, making it *incremental* or *parallel* is another issue, and way
|
|
trickier.
|
|
|
|
That's precisely the niche that [Hakyll] and
|
|
[achille] try to fill: use an embedded DSL in Haskell to specify your *custom* build
|
|
rules, and compile them all into a full-fletched **incremental** static site
|
|
generator executable. Some kind of static site generator *generator*.
|
|
|
|
[Hakyll]: https://jaspervdj.be/hakyll/
|
|
|
|
## Reasoning about static site generators
|
|
|
|
Let's look at what a typical site generator does. A good way to visualize it
|
|
is with a flow diagram, where *boxes* are "build rules". Boxes have
|
|
distinguished inputs and outputs, and dependencies between the build rules are
|
|
represented by wires going from outputs of boxes to inputs of other boxes.
|
|
|
|
The static site generator corresponding to the Haskell code above could be
|
|
represented as the following diagram:
|
|
|
|
...
|
|
|
|
Build rules are clearly identified, and we see that in order to render the `index.html`
|
|
page, we need to wait for the `renderPosts` rule to finish rendering each
|
|
article to HTML and return the metadata of every one of them.
|
|
|
|
Notice how some wires are **continuous** **black** lines, and some other wires are
|
|
faded **dotted** lines. The **dotted lines** represent **side effects** of the
|
|
generator.
|
|
|
|
- files that are read from the file system, like all the markdown files in
|
|
`posts/`.
|
|
- files that are written to the filesystem, like the HTML output of every
|
|
article, or the `index.html` file.
|
|
|
|
The first insight is to realize that the build system *shouldn't care about side
|
|
effects*. Its *only* role is to know whether build rules *should be executed*,
|
|
and how intermediate values get passed around.
|
|
|
|
### The `Recipe m` abstraction
|
|
|
|
I had my gripes with Hakyll, and was looking for a simpler, more general way to
|
|
express build rules. I came up with the `Recipe` abstraction:
|
|
|
|
```haskell
|
|
newtype Recipe m a b =
|
|
{ runRecipe :: Context -> Cache -> a -> m (b, Cache) }
|
|
```
|
|
|
|
It's just a glorified Kleisli arrow: a `Recipe m a b` will produce an output of
|
|
type `b` by running a computation in `m`, given some input of type `a`.
|
|
|
|
The purpose is to *abstract over side effects* of build rules (such as producing
|
|
HTML files on disk) and shift the attention to *intermediate values* that flow
|
|
between build rules.
|
|
|
|
As one could expect, if `m` is a monad, so is `Recipe m a`. This means composing
|
|
recipes is very easy and dependencies *between* those are stated **explicitely**
|
|
in the code.
|
|
|
|
```haskell
|
|
main :: IO ()
|
|
main = achille do
|
|
posts <- match "posts/*.md" compilePost
|
|
compileIndex posts
|
|
```
|
|
|
|
``` {=html}
|
|
<details>
|
|
<summary>Type signatures</summary>
|
|
```
|
|
Simplifying a bit, these would be the type signatures of the building blocks in
|
|
the code above.
|
|
```haskell
|
|
compilePost :: Recipe IO FilePath PostMeta
|
|
match :: GlobPattern -> (Recipe IO FilePath b) -> Recipe IO () [b]
|
|
compileIndex :: PostMeta -> Recipe IO () ()
|
|
achille :: Recipe IO () () -> IO ()
|
|
```
|
|
``` {=html}
|
|
</details>
|
|
```
|
|
|
|
There are no ambiguities about the ordering of build rules and the evaluation model
|
|
is in turn *very* simple --- in contrast to Hakyll, its global store and
|
|
implicit ordering.
|
|
|
|
### Caching
|
|
|
|
In the definition of `Recipe`, a recipe takes some `Cache` as input, and
|
|
returns another one after the computation is done. This cache is simply a *lazy
|
|
bytestring*, and enables recipes to have some *persistent storage* between
|
|
runs, that they can use in any way they desire.
|
|
|
|
The key insight is how composition of recipes is handled:
|
|
|
|
```haskell
|
|
(*>) :: Recipe m a b -> Recipe m a c -> Recipe m a c
|
|
Recipe f *> Recipe g = Recipe \ctx cache x -> do
|
|
let (cf, cg) = splitCache cache
|
|
(_, cf') <- f ctx cf x
|
|
(y, cg') <- g ctx cg x
|
|
pure (y, joinCache cf cg)
|
|
```
|
|
|
|
The cache is split in two, and both pieces are forwarded to their respective
|
|
recipe. Once the computation is done, the resulting caches are put together
|
|
into one again.
|
|
|
|
This ensures that every recipe will be attributed the same local cache
|
|
--- assuming the description of the generator does not change between runs. Of
|
|
course this is only true when `Recipe m` is merely used as *selective*
|
|
applicative functor, though I doubt you need more than that for writing a
|
|
static site generator. It's not perfect, but I can say that this very simple model
|
|
for caching has proven to be surprisingly powerful.
|
|
|
|
I have improved upon it since then, in order to make sure that
|
|
composition is associative and to enable some computationally intensive recipes to
|
|
become insensitive to code refactorings, but the core idea is left unchanged.
|
|
|
|
### Incremental evaluation and dependency tracking
|
|
|
|
### But there is a but
|
|
|
|
## Arrows
|
|
|
|
I really like the `do` notation, but sadly losing this information about
|
|
variable use is bad, so no luck. If only there was a way to *overload* the
|
|
lambda abstraction syntax of Haskell to transform it into a representation free
|
|
of variable bindings...
|
|
|
|
That's when I discovered Haskell's arrows. It's a generalization of monads,
|
|
and is often presented as a way to compose things that behave like functions.
|
|
And indeed, we can define our very `instance Arrow (Recipe m)`. There is a special
|
|
syntax, the *arrow notation* that kinda looks like the `do` notation, so is this
|
|
the way out?
|
|
|
|
There is something fishy in the definition of `Arrow`:
|
|
|
|
```haskell
|
|
class Category k => Arrow k where
|
|
-- ...
|
|
arr :: (a -> b) -> a `k` b
|
|
```
|
|
|
|
We must be able to lift any function into `k a b` in order to make it an
|
|
`Arrow`. In our case we can do it, that's not the issue. No, the real issue is
|
|
how Haskell desugars the arrow notation.
|
|
|
|
...
|
|
|
|
There is a macro that is a bit smarter than current Haskell's desugarer, but not
|
|
by much. I've seen some discussions about actually fixing this upstream, but I
|
|
don't think anyone actually has the time to do this. So few people use arrows to
|
|
justify the cost.
|
|
|
|
|
|
## Conal Elliott's `concat`
|
|
|
|
Conal Elliott wrote a fascinating paper called *Compiling to Categories*.
|
|
The gist of it is that any cartesian-closed category is a model of simply-typed
|
|
lambda-calculus. Therefore, he made a GHC plugin giving access to a magical
|
|
function:
|
|
|
|
```
|
|
ccc :: Closed k => (a -> b) -> a `k` b
|
|
```
|
|
|
|
You can see that the signature is *very* similar to the one of `arr`.
|
|
|
|
A first issue is that `Recipe m` very much isn't *closed*. Another more
|
|
substantial issue is that the GHC plugin is *very* experimental. I had a hard
|
|
time running it on simple examples, it is barely documented.
|
|
|
|
Does this mean all hope is lost? **NO**.
|
|
|
|
|
|
## Compiling to monoidal cartesian categories
|
|
|
|
Two days ago, I stumbled upon this paper by chance:.
|
|
|
|
What they explain is that many interesting categories to compile to are in fact
|
|
not closed.
|
|
|
|
No GHC plugin required, just a tiny library with a few `class`es.
|
|
|
|
There is one drawback: `Recipe m` *is* cartesian. That is, you can freely
|
|
duplicate values. In their framework, they have you explicitely insert `dup` to
|
|
duplicate a value. This is a bit annoying, but they have a good reason to do so:
|