6.4 KiB
title | date | draft |
---|---|---|
Building my site with monoidal categories | 2022-12-06 | true |
Or how the right theoretical framework solved the last problem I had in the way of incremental generation for "free": reasoning about dependencies optimally.
A while back I made achille, a library for building incremental static site generators in Haskell. I'm not gonna delve into why for long, if you want the full motivation you can read the details in the (outdated) documentation.
The point was:
- static sites are good, therefore one wants to use static site generators.
- the way to build their site becomes quite intricate and difficult to express with existing static site generators.
- thus one ends up making their own custom generator suited for the task.
Making your own static site generator is not very hard, but making it incremental is tedious and requires some thinking.
That's the niche that Hakyll tries to fill: an embedded DSL in Haskell to specify your build rules, and compile them into a full-fletched incremental static site generator. Some kind of static site generator generator.
achille, as it used to be
I had my gripes with Hakyll, and was looking for a simpler, more general way to
express build rules. I came up with the Recipe
abstraction:
newtype Recipe m a b =
{ runRecipe :: Context -> Cache -> a -> m (b, Cache) }
It's just a glorified Kleisli arrow: a Recipe m a b
will produce an output of
type b
by running a computation in m
, given some input of type a
.
The purpose is to abstract over side effects of build rules (such as producing HTML files on disk) and shift the attention to intermediate values that flow between build rules.
As one could expect, if m
is a monad, so is Recipe m a
. This means composing
recipes is very easy and dependencies between those are stated explicitely
in the code.
main :: IO ()
main = achille do
posts <- match "posts/*.md" compilePost
compileIndex posts
<details>
<summary>Type signatures</summary>
Simplifying a bit, these would be the type signatures of the building blocks in the code above.
compilePost :: Recipe IO FilePath PostMeta
match :: GlobPattern -> (Recipe IO FilePath b) -> Recipe IO () [b]
compileIndex :: PostMeta -> Recipe IO () ()
achille :: Recipe IO () () -> IO ()
</details>
There are no ambiguities about the ordering of build rules and the evaluation model is in turn very simple --- in contrast to Hakyll, its global store and implicit ordering.
Caching
In the definition of Recipe
, a recipe takes some Cache
as input, and
returns another one after the computation is done. This cache is simply a lazy
bytestring, and enables recipes to have some persistent storage between
runs, that they can use in any way they desire.
The key insight is how composition of recipes is handled:
(*>) :: Recipe m a b -> Recipe m a c -> Recipe m a c
Recipe f *> Recipe g = Recipe \ctx cache x -> do
let (cf, cg) = splitCache cache
(_, cf') <- f ctx cf x
(y, cg') <- g ctx cg x
pure (y, joinCache cf cg)
The cache is split in two, and both pieces are forwarded to their respective recipe. Once the computation is done, the resulting caches are put together into one again.
This ensures that every recipe will be attributed the same local cache
--- assuming the description of the generator does not change between runs. Of
course this is only true when Recipe m
is merely used as selective
applicative functor, though I doubt you need more than that for writing a
static site generator. It's not perfect, but I can say that this very simple model
for caching has proven to be surprisingly powerful.
I have improved upon it since then, in order to make sure that composition is associative and to enable some computationally intensive recipes to become insensitive to code refactorings, but the core idea is left unchanged.
Incremental evaluation and dependency tracking
But there is a but
Arrows
I really like the do
notation, but sadly losing this information about
variable use is bad, so no luck. If only there was a way to overload the
lambda abstraction syntax of Haskell to transform it into a representation free
of variable bindings...
That's when I discovered Haskell's arrows. It's a generalization of monads,
and is often presented as a way to compose things that behave like functions.
And indeed, we can define our very instance Arrow (Recipe m)
. There is a special
syntax, the arrow notation that kinda looks like the do
notation, so is this
the way out?
There is something fishy in the definition of Arrow
:
class Category k => Arrow k where
-- ...
arr :: (a -> b) -> a `k` b
We must be able to lift any function into k a b
in order to make it an
Arrow
. In our case we can do it, that's not the issue. No, the real issue is
how Haskell desugars the arrow notation.
...
There is a macro that is a bit smarter than current Haskell's desugarer, but not by much. I've seen some discussions about actually fixing this upstream, but I don't think anyone actually has the time to do this. So few people use arrows to justify the cost.
Conal Elliott's concat
Conal Elliott wrote a fascinating paper called Compiling to Categories. The gist of it is that any cartesian-closed category is a model of simply-typed lambda-calculus. Therefore, he made a GHC plugin giving access to a magical function:
ccc :: Closed k => (a -> b) -> a `k` b
You can see that the signature is very similar to the one of arr
.
A first issue is that Recipe m
very much isn't closed. Another more
substantial issue is that the GHC plugin is very experimental. I had a hard
time running it on simple examples, it is barely documented.
Does this mean all hope is lost? NO.
Compiling to monoidal cartesian categories
Two days ago, I stumbled upon this paper by chance:.
What they explain is that many interesting categories to compile to are in fact not closed.
No GHC plugin required, just a tiny library with a few class
es.
There is one drawback: Recipe m
is cartesian. That is, you can freely
duplicate values. In their framework, they have you explicitely insert dup
to
duplicate a value. This is a bit annoying, but they have a good reason to do so: