first draft on achille with smc

This commit is contained in:
flupe 2022-12-06 20:58:05 +01:00
parent a97b4cf584
commit a2046102e7
1 changed files with 181 additions and 0 deletions

View File

@ -0,0 +1,181 @@
---
title: Building my site with monoidal categories
date: 2022-12-06
draft: true
---
Or how the right theoretical framework solved the last problem I had in the way
of incremental generation for "free": reasoning about dependencies optimally.
---
A while back I made [achille](/projects/achille), a library for building
incremental static site generators in Haskell. I'm not gonna delve into *why*
for long, if you want the full motivation you can read the details in the
(outdated) [documentation](/projects/achille/1-motivation.html).
The point was:
- static sites are good, therefore one wants to use static site generators.
- the way to build their site becomes quite intricate and difficult to express
with existing static site generators.
- thus one ends up making their own custom generator suited for the task.
Making your own static site generator is not very hard, but making it
*incremental* is tedious and requires some thinking.
That's the niche that [Hakyll](https://jaspervdj.be/hakyll/) tries to fill: an
embedded DSL in Haskell to specify your build rules, and compile them into a
full-fletched **incremental** static site generator. Some kind of static site
generator *generator*.
## achille, as it used to be
I had my gripes with Hakyll, and was looking for a simpler, more general way to
express build rules. I came up with the `Recipe` abstraction:
```haskell
newtype Recipe m a b =
{ runRecipe :: Context -> Cache -> a -> m (b, Cache) }
```
It's just a glorified Kleisli arrow: a `Recipe m a b` will produce an output of
type `b` by running a computation in `m`, given some input of type `a`.
The purpose is to *abstract over side effects* of build rules (such as producing
HTML files on disk) and shift the attention to *intermediate values* that flow
between build rules.
As one could expect, if `m` is a monad, so is `Recipe m a`. This means composing
recipes is very easy and dependencies *between* those are stated **explicitely**
in the code.
```haskell
main :: IO ()
main = achille do
posts <- match "posts/*.md" compilePost
compileIndex posts
```
``` {=html}
<details>
<summary>Type signatures</summary>
```
Simplifying a bit, these would be the type signatures of the building blocks in
the code above.
```haskell
compilePost :: Recipe IO FilePath PostMeta
match :: GlobPattern -> (Recipe IO FilePath b) -> Recipe IO () [b]
compileIndex :: PostMeta -> Recipe IO () ()
achille :: Recipe IO () () -> IO ()
```
``` {=html}
</details>
```
There are no ambiguities about the ordering of build rules and the evaluation model
is in turn *very* simple --- in contrast to Hakyll, its global store and
implicit ordering.
### Caching
In the definition of `Recipe`, a recipe takes some `Cache` as input, and
returns another one after the computation is done. This cache is simply a *lazy
bytestring*, and enables recipes to have some *persistent storage* between
runs, that they can use in any way they desire.
The key insight is how composition of recipes is handled:
```haskell
(*>) :: Recipe m a b -> Recipe m a c -> Recipe m a c
Recipe f *> Recipe g = Recipe \ctx cache x -> do
let (cf, cg) = splitCache cache
(_, cf') <- f ctx cf x
(y, cg') <- g ctx cg x
pure (y, joinCache cf cg)
```
The cache is split in two, and both pieces are forwarded to their respective
recipe. Once the computation is done, the resulting caches are put together
into one again.
This ensures that every recipe will be attributed the same local cache
--- assuming the description of the generator does not change between runs. Of
course this is only true when `Recipe m` is merely used as *selective*
applicative functor, though I doubt you need more than that for writing a
static site generator. It's not perfect, but I can say that this very simple model
for caching has proven to be surprisingly powerful.
I have improved upon it since then, in order to make sure that
composition is associative and to enable some computationally intensive recipes to
become insensitive to code refactorings, but the core idea is left unchanged.
### Incremental evaluation and dependency tracking
### But there is a but
## Arrows
I really like the `do` notation, but sadly losing this information about
variable use is bad, so no luck. If only there was a way to *overload* the
lambda abstraction syntax of Haskell to transform it into a representation free
of variable bindings...
That's when I discovered Haskell's arrows. It's a generalization of monads,
and is often presented as a way to compose things that behave like functions.
And indeed, we can define our very `instance Arrow (Recipe m)`. There is a special
syntax, the *arrow notation* that kinda looks like the `do` notation, so is this
the way out?
There is something fishy in the definition of `Arrow`:
```haskell
class Category k => Arrow k where
-- ...
arr :: (a -> b) -> a `k` b
```
We must be able to lift any function into `k a b` in order to make it an
`Arrow`. In our case we can do it, that's not the issue. No, the real issue is
how Haskell desugars the arrow notation.
...
There is a macro that is a bit smarter than current Haskell's desugarer, but not
by much. I've seen some discussions about actually fixing this upstream, but I
don't think anyone actually has the time to do this. So few people use arrows to
justify the cost.
## Conal Elliott's `concat`
Conal Elliott wrote a fascinating paper called *Compiling to Categories*.
The gist of it is that any cartesian-closed category is a model of simply-typed
lambda-calculus. Therefore, he made a GHC plugin giving access to a magical
function:
```
ccc :: Closed k => (a -> b) -> a `k` b
```
You can see that the signature is *very* similar to the one of `arr`.
A first issue is that `Recipe m` very much isn't *closed*. Another more
substantial issue is that the GHC plugin is *very* experimental. I had a hard
time running it on simple examples, it is barely documented.
Does this mean all hope is lost? **NO**.
## Compiling to monoidal cartesian categories
Two days ago, I stumbled upon this paper by chance:.
What they explain is that many interesting categories to compile to are in fact
not closed.
No GHC plugin required, just a tiny library with a few `class`es.
There is one drawback: `Recipe m` *is* cartesian. That is, you can freely
duplicate values. In their framework, they have you explicitely insert `dup` to
duplicate a value. This is a bit annoying, but they have a good reason to do so: