From a2046102e730eebd6aa987bf6cd05677ebd84762 Mon Sep 17 00:00:00 2001 From: flupe Date: Tue, 6 Dec 2022 20:58:05 +0100 Subject: [PATCH] first draft on achille with smc --- content/posts/achille-smc.md | 181 +++++++++++++++++++++++++++++++++++ 1 file changed, 181 insertions(+) create mode 100644 content/posts/achille-smc.md diff --git a/content/posts/achille-smc.md b/content/posts/achille-smc.md new file mode 100644 index 0000000..fe11195 --- /dev/null +++ b/content/posts/achille-smc.md @@ -0,0 +1,181 @@ +--- +title: Building my site with monoidal categories +date: 2022-12-06 +draft: true +--- + +Or how the right theoretical framework solved the last problem I had in the way +of incremental generation for "free": reasoning about dependencies optimally. + +--- + +A while back I made [achille](/projects/achille), a library for building +incremental static site generators in Haskell. I'm not gonna delve into *why* +for long, if you want the full motivation you can read the details in the +(outdated) [documentation](/projects/achille/1-motivation.html). + +The point was: + +- static sites are good, therefore one wants to use static site generators. +- the way to build their site becomes quite intricate and difficult to express + with existing static site generators. +- thus one ends up making their own custom generator suited for the task. + +Making your own static site generator is not very hard, but making it +*incremental* is tedious and requires some thinking. + +That's the niche that [Hakyll](https://jaspervdj.be/hakyll/) tries to fill: an +embedded DSL in Haskell to specify your build rules, and compile them into a +full-fletched **incremental** static site generator. Some kind of static site +generator *generator*. + +## achille, as it used to be + +I had my gripes with Hakyll, and was looking for a simpler, more general way to +express build rules. I came up with the `Recipe` abstraction: + +```haskell +newtype Recipe m a b = + { runRecipe :: Context -> Cache -> a -> m (b, Cache) } +``` + +It's just a glorified Kleisli arrow: a `Recipe m a b` will produce an output of +type `b` by running a computation in `m`, given some input of type `a`. + +The purpose is to *abstract over side effects* of build rules (such as producing +HTML files on disk) and shift the attention to *intermediate values* that flow +between build rules. + +As one could expect, if `m` is a monad, so is `Recipe m a`. This means composing +recipes is very easy and dependencies *between* those are stated **explicitely** +in the code. + +```haskell +main :: IO () +main = achille do + posts <- match "posts/*.md" compilePost + compileIndex posts +``` + +``` {=html} +
+ Type signatures +``` +Simplifying a bit, these would be the type signatures of the building blocks in +the code above. +```haskell +compilePost :: Recipe IO FilePath PostMeta +match :: GlobPattern -> (Recipe IO FilePath b) -> Recipe IO () [b] +compileIndex :: PostMeta -> Recipe IO () () +achille :: Recipe IO () () -> IO () +``` +``` {=html} +
+``` + +There are no ambiguities about the ordering of build rules and the evaluation model +is in turn *very* simple --- in contrast to Hakyll, its global store and +implicit ordering. + +### Caching + +In the definition of `Recipe`, a recipe takes some `Cache` as input, and +returns another one after the computation is done. This cache is simply a *lazy +bytestring*, and enables recipes to have some *persistent storage* between +runs, that they can use in any way they desire. + +The key insight is how composition of recipes is handled: + +```haskell +(*>) :: Recipe m a b -> Recipe m a c -> Recipe m a c +Recipe f *> Recipe g = Recipe \ctx cache x -> do + let (cf, cg) = splitCache cache + (_, cf') <- f ctx cf x + (y, cg') <- g ctx cg x + pure (y, joinCache cf cg) +``` + +The cache is split in two, and both pieces are forwarded to their respective +recipe. Once the computation is done, the resulting caches are put together +into one again. + +This ensures that every recipe will be attributed the same local cache +--- assuming the description of the generator does not change between runs. Of +course this is only true when `Recipe m` is merely used as *selective* +applicative functor, though I doubt you need more than that for writing a +static site generator. It's not perfect, but I can say that this very simple model +for caching has proven to be surprisingly powerful. + +I have improved upon it since then, in order to make sure that +composition is associative and to enable some computationally intensive recipes to +become insensitive to code refactorings, but the core idea is left unchanged. + +### Incremental evaluation and dependency tracking + +### But there is a but + +## Arrows + +I really like the `do` notation, but sadly losing this information about +variable use is bad, so no luck. If only there was a way to *overload* the +lambda abstraction syntax of Haskell to transform it into a representation free +of variable bindings... + +That's when I discovered Haskell's arrows. It's a generalization of monads, +and is often presented as a way to compose things that behave like functions. +And indeed, we can define our very `instance Arrow (Recipe m)`. There is a special +syntax, the *arrow notation* that kinda looks like the `do` notation, so is this +the way out? + +There is something fishy in the definition of `Arrow`: + +```haskell +class Category k => Arrow k where + -- ... + arr :: (a -> b) -> a `k` b +``` + +We must be able to lift any function into `k a b` in order to make it an +`Arrow`. In our case we can do it, that's not the issue. No, the real issue is +how Haskell desugars the arrow notation. + +... + +There is a macro that is a bit smarter than current Haskell's desugarer, but not +by much. I've seen some discussions about actually fixing this upstream, but I +don't think anyone actually has the time to do this. So few people use arrows to +justify the cost. + + +## Conal Elliott's `concat` + +Conal Elliott wrote a fascinating paper called *Compiling to Categories*. +The gist of it is that any cartesian-closed category is a model of simply-typed +lambda-calculus. Therefore, he made a GHC plugin giving access to a magical +function: + +``` +ccc :: Closed k => (a -> b) -> a `k` b +``` + +You can see that the signature is *very* similar to the one of `arr`. + +A first issue is that `Recipe m` very much isn't *closed*. Another more +substantial issue is that the GHC plugin is *very* experimental. I had a hard +time running it on simple examples, it is barely documented. + +Does this mean all hope is lost? **NO**. + + +## Compiling to monoidal cartesian categories + +Two days ago, I stumbled upon this paper by chance:. + +What they explain is that many interesting categories to compile to are in fact +not closed. + +No GHC plugin required, just a tiny library with a few `class`es. + +There is one drawback: `Recipe m` *is* cartesian. That is, you can freely +duplicate values. In their framework, they have you explicitely insert `dup` to +duplicate a value. This is a bit annoying, but they have a good reason to do so: