--- title: Building my site with monoidal categories date: 2022-12-06 draft: true --- Or how the right theoretical framework solved the last problem I had in the way of incremental generation for "free": reasoning about dependencies optimally. --- A while back I made [achille](/projects/achille), a library for building incremental static site generators in Haskell. I'm not gonna delve into *why* for long, if you want the full motivation you can read the details in the (outdated) [documentation](/projects/achille/1-motivation.html). The point was: - static sites are good, therefore one wants to use static site generators. - the way to build their site becomes quite intricate and difficult to express with existing static site generators. - thus one ends up making their own custom generator suited for the task. Making your own static site generator is not very hard, but making it *incremental* is tedious and requires some thinking. That's the niche that [Hakyll](https://jaspervdj.be/hakyll/) tries to fill: an embedded DSL in Haskell to specify your build rules, and compile them into a full-fletched **incremental** static site generator. Some kind of static site generator *generator*. ## achille, as it used to be I had my gripes with Hakyll, and was looking for a simpler, more general way to express build rules. I came up with the `Recipe` abstraction: ```haskell newtype Recipe m a b = { runRecipe :: Context -> Cache -> a -> m (b, Cache) } ``` It's just a glorified Kleisli arrow: a `Recipe m a b` will produce an output of type `b` by running a computation in `m`, given some input of type `a`. The purpose is to *abstract over side effects* of build rules (such as producing HTML files on disk) and shift the attention to *intermediate values* that flow between build rules. As one could expect, if `m` is a monad, so is `Recipe m a`. This means composing recipes is very easy and dependencies *between* those are stated **explicitely** in the code. ```haskell main :: IO () main = achille do posts <- match "posts/*.md" compilePost compileIndex posts ``` ``` {=html}
Type signatures ``` Simplifying a bit, these would be the type signatures of the building blocks in the code above. ```haskell compilePost :: Recipe IO FilePath PostMeta match :: GlobPattern -> (Recipe IO FilePath b) -> Recipe IO () [b] compileIndex :: PostMeta -> Recipe IO () () achille :: Recipe IO () () -> IO () ``` ``` {=html}
``` There are no ambiguities about the ordering of build rules and the evaluation model is in turn *very* simple --- in contrast to Hakyll, its global store and implicit ordering. ### Caching In the definition of `Recipe`, a recipe takes some `Cache` as input, and returns another one after the computation is done. This cache is simply a *lazy bytestring*, and enables recipes to have some *persistent storage* between runs, that they can use in any way they desire. The key insight is how composition of recipes is handled: ```haskell (*>) :: Recipe m a b -> Recipe m a c -> Recipe m a c Recipe f *> Recipe g = Recipe \ctx cache x -> do let (cf, cg) = splitCache cache (_, cf') <- f ctx cf x (y, cg') <- g ctx cg x pure (y, joinCache cf cg) ``` The cache is split in two, and both pieces are forwarded to their respective recipe. Once the computation is done, the resulting caches are put together into one again. This ensures that every recipe will be attributed the same local cache --- assuming the description of the generator does not change between runs. Of course this is only true when `Recipe m` is merely used as *selective* applicative functor, though I doubt you need more than that for writing a static site generator. It's not perfect, but I can say that this very simple model for caching has proven to be surprisingly powerful. I have improved upon it since then, in order to make sure that composition is associative and to enable some computationally intensive recipes to become insensitive to code refactorings, but the core idea is left unchanged. ### Incremental evaluation and dependency tracking ### But there is a but ## Arrows I really like the `do` notation, but sadly losing this information about variable use is bad, so no luck. If only there was a way to *overload* the lambda abstraction syntax of Haskell to transform it into a representation free of variable bindings... That's when I discovered Haskell's arrows. It's a generalization of monads, and is often presented as a way to compose things that behave like functions. And indeed, we can define our very `instance Arrow (Recipe m)`. There is a special syntax, the *arrow notation* that kinda looks like the `do` notation, so is this the way out? There is something fishy in the definition of `Arrow`: ```haskell class Category k => Arrow k where -- ... arr :: (a -> b) -> a `k` b ``` We must be able to lift any function into `k a b` in order to make it an `Arrow`. In our case we can do it, that's not the issue. No, the real issue is how Haskell desugars the arrow notation. ... There is a macro that is a bit smarter than current Haskell's desugarer, but not by much. I've seen some discussions about actually fixing this upstream, but I don't think anyone actually has the time to do this. So few people use arrows to justify the cost. ## Conal Elliott's `concat` Conal Elliott wrote a fascinating paper called *Compiling to Categories*. The gist of it is that any cartesian-closed category is a model of simply-typed lambda-calculus. Therefore, he made a GHC plugin giving access to a magical function: ``` ccc :: Closed k => (a -> b) -> a `k` b ``` You can see that the signature is *very* similar to the one of `arr`. A first issue is that `Recipe m` very much isn't *closed*. Another more substantial issue is that the GHC plugin is *very* experimental. I had a hard time running it on simple examples, it is barely documented. Does this mean all hope is lost? **NO**. ## Compiling to monoidal cartesian categories Two days ago, I stumbled upon this paper by chance:. What they explain is that many interesting categories to compile to are in fact not closed. No GHC plugin required, just a tiny library with a few `class`es. There is one drawback: `Recipe m` *is* cartesian. That is, you can freely duplicate values. In their framework, they have you explicitely insert `dup` to duplicate a value. This is a bit annoying, but they have a good reason to do so: