2022-12-06 19:58:05 +00:00
|
|
|
|
---
|
2022-12-09 14:28:09 +00:00
|
|
|
|
title: Generating incremental static site generators in Haskell using cartesian categories
|
2022-12-06 19:58:05 +00:00
|
|
|
|
date: 2022-12-06
|
|
|
|
|
draft: true
|
2022-12-09 14:28:09 +00:00
|
|
|
|
toc: true
|
2022-12-06 19:58:05 +00:00
|
|
|
|
---
|
|
|
|
|
|
2022-12-09 14:28:09 +00:00
|
|
|
|
A few days ago, I released the new version of [achille], a Haskell library
|
|
|
|
|
providing an EDSL for writing static site generators. This embedded language produces
|
|
|
|
|
efficient, *incremental* and *parallel* static site generators, *for free*.
|
|
|
|
|
|
|
|
|
|
[achille]: /projects/achille
|
|
|
|
|
|
|
|
|
|
In this post, I will explain how [achille] is able to tranform this intuitive, "readable"
|
|
|
|
|
syntax into an incremental static site generator:
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
import Achille as A
|
|
|
|
|
|
|
|
|
|
main :: IO ()
|
|
|
|
|
main = achille $ task A.do
|
2022-12-09 17:53:40 +00:00
|
|
|
|
-- copy every static asset as is
|
|
|
|
|
match_ "assets/*" copyFile
|
|
|
|
|
|
|
|
|
|
-- load site template
|
|
|
|
|
template <- matchFile "template.html" loadTemplate
|
|
|
|
|
|
2022-12-09 14:28:09 +00:00
|
|
|
|
-- render every article in `posts/`
|
|
|
|
|
-- and gather all metadata
|
|
|
|
|
posts <-
|
|
|
|
|
match "posts/*.md" \src -> A.do
|
|
|
|
|
(meta, content) <- processPandocMeta src
|
2022-12-09 17:53:40 +00:00
|
|
|
|
writeFile (src -<.> ".html") (renderPost template meta content)
|
2022-12-09 14:28:09 +00:00
|
|
|
|
meta
|
|
|
|
|
|
|
|
|
|
-- render index page with the 10 most recent articles
|
2022-12-09 17:53:40 +00:00
|
|
|
|
renderIndex template (take 10 (sort posts))
|
2022-12-09 14:28:09 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Importantly, I want to emphasize that *you* --- the library user --- neither
|
|
|
|
|
have to care about or understand the internals of [achille] in order to use it.
|
2022-12-09 17:53:40 +00:00
|
|
|
|
*Most* of the machinery below is purposefully kept hidden from plain sight. You
|
|
|
|
|
are free to ignore this post and directly go through the [user manual][manual]
|
|
|
|
|
to get started!
|
2022-12-09 14:28:09 +00:00
|
|
|
|
|
|
|
|
|
[manual]: /projects/achille/
|
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
This article is just there to document how the right theoretical framework was
|
|
|
|
|
instrumental in providing a good user interface *and yet* preserve all the
|
|
|
|
|
desired properties. It also gives pointers on how to reliably overload Haskell's
|
|
|
|
|
*lambda abstraction* syntax, because I'm sure many applications could make good
|
|
|
|
|
use of that but are unaware that there are now ways to do it properly, *without
|
|
|
|
|
any kind of metaprogramming*.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2022-12-09 14:28:09 +00:00
|
|
|
|
## Foreword
|
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
My postulate is that *static sites are good*. Of course not for every
|
|
|
|
|
use case, but for single-user, small-scale websites, it is a very convenient way
|
|
|
|
|
to manage content. Very easy to edit offline, very easy to deploy. All in all
|
2022-12-09 14:28:09 +00:00
|
|
|
|
very nice.
|
|
|
|
|
|
|
|
|
|
There are lots of static site generators readily available. However each and
|
2022-12-09 17:53:40 +00:00
|
|
|
|
every one of them has a very specific idea of how you should *structure* your
|
|
|
|
|
content. For simple websites --- i.e weblogs --- they are wonderful, but as soon
|
|
|
|
|
as you want to heavily customize the generation process of your site or require
|
|
|
|
|
more fancy transformations, and thus step outside of the supported feature set
|
|
|
|
|
of your generator of choice, you're out of luck.
|
2022-12-09 14:28:09 +00:00
|
|
|
|
|
|
|
|
|
For this reason, many people end up not using existing static site generators,
|
|
|
|
|
and instead prefer to write their own. Depending on the language you use, it is
|
2022-12-09 17:53:40 +00:00
|
|
|
|
fairly straightforward to write a little static site generator that does
|
|
|
|
|
precisely what you want. Sadly, making it *incremental* or *parallel* is another
|
|
|
|
|
issue, and way trickier.
|
2022-12-09 14:28:09 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
That's precisely the niche that [Hakyll] and [achille] try to fill: provide an
|
|
|
|
|
embedded DSL in Haskell to specify your *custom* build rules, and compile them
|
|
|
|
|
all into a full-fletched **incremental** static site generator executable. Some
|
|
|
|
|
kind of static site generator *generator*.
|
2022-12-09 14:28:09 +00:00
|
|
|
|
|
|
|
|
|
[Hakyll]: https://jaspervdj.be/hakyll/
|
|
|
|
|
|
|
|
|
|
## Reasoning about static site generators
|
|
|
|
|
|
|
|
|
|
Let's look at what a typical site generator does. A good way to visualize it
|
|
|
|
|
is with a flow diagram, where *boxes* are "build rules". Boxes have
|
|
|
|
|
distinguished inputs and outputs, and dependencies between the build rules are
|
|
|
|
|
represented by wires going from outputs of boxes to inputs of other boxes.
|
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
The static site generator corresponding to the Haskell code above corresponds
|
|
|
|
|
to the following diagram:
|
2022-12-09 14:28:09 +00:00
|
|
|
|
|
|
|
|
|
...
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 14:28:09 +00:00
|
|
|
|
Build rules are clearly identified, and we see that in order to render the `index.html`
|
2022-12-09 17:53:40 +00:00
|
|
|
|
page, *we need to wait* for the `renderPosts` rule to finish rendering each
|
2022-12-09 14:28:09 +00:00
|
|
|
|
article to HTML and return the metadata of every one of them.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 14:28:09 +00:00
|
|
|
|
Notice how some wires are **continuous** **black** lines, and some other wires are
|
|
|
|
|
faded **dotted** lines. The **dotted lines** represent **side effects** of the
|
|
|
|
|
generator.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 14:28:09 +00:00
|
|
|
|
- files that are read from the file system, like all the markdown files in
|
|
|
|
|
`posts/`.
|
|
|
|
|
- files that are written to the filesystem, like the HTML output of every
|
|
|
|
|
article, or the `index.html` file.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
The first important insight is to realize that the build system *shouldn't care
|
|
|
|
|
about side effects*. Its *only* role is to know whether build rules *should be
|
|
|
|
|
executed*, how intermediate values get passed around, and how they change
|
|
|
|
|
between consecutive runs.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 14:28:09 +00:00
|
|
|
|
### The `Recipe m` abstraction
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
newtype Recipe m a b =
|
|
|
|
|
{ runRecipe :: Context -> Cache -> a -> m (b, Cache) }
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
It's just a glorified Kleisli arrow: a `Recipe m a b` will produce an output of
|
|
|
|
|
type `b` by running a computation in `m`, given some input of type `a`.
|
|
|
|
|
|
|
|
|
|
The purpose is to *abstract over side effects* of build rules (such as producing
|
|
|
|
|
HTML files on disk) and shift the attention to *intermediate values* that flow
|
|
|
|
|
between build rules.
|
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
Visual noise
|
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
2022-12-06 19:58:05 +00:00
|
|
|
|
### Caching
|
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
In the definition of `Recipe m a b`, a recipe takes some `Cache` as input, and
|
|
|
|
|
returns another one after the computation is done.
|
|
|
|
|
|
|
|
|
|
This cache --- for which I'm not gonna give a definition here --- enables recipes to
|
|
|
|
|
have some *persistent storage* between runs, that they can use in any way they
|
|
|
|
|
desire.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
|
|
|
|
The key insight is how composition of recipes is handled:
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
(*>) :: Recipe m a b -> Recipe m a c -> Recipe m a c
|
|
|
|
|
Recipe f *> Recipe g = Recipe \ctx cache x -> do
|
|
|
|
|
let (cf, cg) = splitCache cache
|
|
|
|
|
(_, cf') <- f ctx cf x
|
|
|
|
|
(y, cg') <- g ctx cg x
|
|
|
|
|
pure (y, joinCache cf cg)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The cache is split in two, and both pieces are forwarded to their respective
|
|
|
|
|
recipe. Once the computation is done, the resulting caches are put together
|
|
|
|
|
into one again.
|
|
|
|
|
|
|
|
|
|
This ensures that every recipe will be attributed the same local cache
|
2022-12-10 00:30:23 +00:00
|
|
|
|
--- assuming the description of the generator does not change between runs.
|
|
|
|
|
It's not perfect, but I can say that this very simple model
|
2022-12-06 19:58:05 +00:00
|
|
|
|
for caching has proven to be surprisingly powerful.
|
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
...
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
|
|
|
|
### Incremental evaluation and dependency tracking
|
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
...
|
|
|
|
|
|
2022-12-06 19:58:05 +00:00
|
|
|
|
### But there is a but
|
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
We've now defined all the operations we could wish for in order to build,
|
|
|
|
|
compose and combine recipes. We've even found the theoretical framework our
|
|
|
|
|
concrete application inserts itself into. How cool!
|
|
|
|
|
|
|
|
|
|
**But there is catch**, and I hope you've already been thinking about it:
|
|
|
|
|
**what an awful, awful way to write recipes**.
|
|
|
|
|
|
|
|
|
|
Sure, it's nice to know that we have all the primitive operations required to
|
|
|
|
|
express all the flow diagrams we could ever be interested in. We *can*
|
|
|
|
|
definitely define the site generator that has been serving as example
|
|
|
|
|
throughout:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
rules :: Task ()
|
|
|
|
|
rules = renderIndex ∘ (...)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
But I hope we can all agree on the fact that this code is **complete
|
|
|
|
|
gibberish**. It's likely *some* Haskellers would be perfectly happy with this
|
|
|
|
|
interface, but alas my library isn't *only* targeted to this crowd. No, what I
|
|
|
|
|
really want is a way to assign intermediate results --- outputs of rules --- to
|
|
|
|
|
*variables*, that then get used as inputs. Plain old Haskell variables. That is,
|
|
|
|
|
I want to write my recipes as plain old *functions*.
|
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
---
|
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
And here is where my --- intermittent --- search for a readable syntax started,
|
|
|
|
|
roughly two years ago.
|
|
|
|
|
|
|
|
|
|
## The quest for a friendly syntax
|
|
|
|
|
|
|
|
|
|
### Monads
|
|
|
|
|
|
|
|
|
|
If you've done a bit of Haskell, you *may* know that as soon as you're working
|
|
|
|
|
with things that compose and sequence, there are high chances that what you're
|
|
|
|
|
working with are *monads*. Perhaps the most well-known example is the `IO`
|
|
|
|
|
monad. A value of type `IO a` represents a computation that, after doing
|
|
|
|
|
side-effects (reading a file, writing a file, ...) will produce a value of type
|
|
|
|
|
`a`.
|
|
|
|
|
|
|
|
|
|
Crucially, being a monad means you have a way to *sequence* computations. In
|
|
|
|
|
the case of the `IO` monad, the bind operation has the following type:
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
(>>=) :: IO a -> (a -> IO b) -> IO b
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
And because monads are so prevalent in Haskell, there is a *custom syntax*, the
|
|
|
|
|
`do` notation, that allows you to bind results of computations to *variables*
|
|
|
|
|
that can be used for the following computations. This syntax gets desugared into
|
|
|
|
|
the primitive operations `(>>=)` and `pure`.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
```haskell
|
|
|
|
|
main :: IO ()
|
|
|
|
|
main = do
|
|
|
|
|
content <- readFile "input.txt"
|
|
|
|
|
writeFile "output.txt" content
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The above gets transformed into:
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
main :: IO ()
|
|
|
|
|
main = readFile "input.txt" >>= writeFile "output.txt"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Looks promising, right? I can define a `Monad` instance for `Recipe m a`,
|
|
|
|
|
fairly easily.
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
instance Monad (Recipe m a) where
|
|
|
|
|
(>>=) :: Recipe m a b -> (b -> Recipe m a c) -> Recipe m a c
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
And now problem solved?
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
rules :: Task IO ()
|
|
|
|
|
rules = do
|
|
|
|
|
posts <- match "posts/*.md" renderPosts
|
|
|
|
|
renderIndex posts
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The answer is a resolute **no**. The problem becomes apparent when we try to
|
|
|
|
|
actually define this `(>>=)` operation.
|
|
|
|
|
|
|
|
|
|
1. The second argument is a Haskell function of type `b -> Recipe m a c`. And
|
|
|
|
|
precisely because it is a Haskell function, it can do anything it wants
|
|
|
|
|
depending on the value of its argument. In particular, it could very well
|
|
|
|
|
return *different recipes* for *different inputs*. That is, the *structure*
|
|
|
|
|
of the graph is no longer *static*, and could change between runs, if the
|
|
|
|
|
output of type `b` from the first rule happens to change. This is **very
|
|
|
|
|
bad**, because we rely on the static structure of recipes to make the claim
|
|
|
|
|
that the cache stays consistent between runs.
|
|
|
|
|
|
|
|
|
|
Ok, sure, but what if we assume that users don't do bad things (we never should).
|
|
|
|
|
No, even then, there is an ever bigger problem:
|
|
|
|
|
|
|
|
|
|
2. Because the second argument is *just a Haskell function*.
|
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
...
|
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
## Arrows
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
|
|
|
|
That's when I discovered Haskell's arrows. It's a generalization of monads,
|
|
|
|
|
and is often presented as a way to compose things that behave like functions.
|
|
|
|
|
And indeed, we can define our very `instance Arrow (Recipe m)`. There is a special
|
|
|
|
|
syntax, the *arrow notation* that kinda looks like the `do` notation, so is this
|
|
|
|
|
the way out?
|
|
|
|
|
|
|
|
|
|
There is something fishy in the definition of `Arrow`:
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
class Category k => Arrow k where
|
|
|
|
|
-- ...
|
|
|
|
|
arr :: (a -> b) -> a `k` b
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
We must be able to lift any function into `k a b` in order to make it an
|
|
|
|
|
`Arrow`. In our case we can do it, that's not the issue. No, the real issue is
|
|
|
|
|
how Haskell desugars the arrow notation.
|
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
So. Haskell's `Arrow` isn't it either. Well, in principle it *should* be the
|
|
|
|
|
solution. But the desugarer is broken, the syntax still unreadable to my taste,
|
|
|
|
|
and nobody has the will to fix it.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
This syntax investigation must carry on.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
## Compiling to cartesian closed categories
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
About a year after this project started, and well after I had given up on this
|
|
|
|
|
whole endeavour, I happened to pass by Conal Elliott's fascinating paper
|
|
|
|
|
["Compiling to Categories"][ccc]. In this paper, Conal recalls:
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
[ccc]: http://conal.net/papers/compiling-to-categories/
|
|
|
|
|
|
|
|
|
|
> It is well-known that the simply typed lambda-calculus is modeled by any
|
|
|
|
|
> cartesian closed category (CCC)
|
|
|
|
|
|
|
|
|
|
I had heard of it, that is true. What this means is that, given any cartesian
|
|
|
|
|
closed category, any *term* of type `a -> b` (a function) in the simply-typed
|
|
|
|
|
lambda calculus corresponds to (can be interpreted as) an *arrow* (morphism)
|
|
|
|
|
`a -> b` in the category. But a cartesian-closed category crucially has no notion
|
|
|
|
|
of *variables*, just some *arrows* and operations to compose and rearrange them
|
|
|
|
|
(among other things). Yet in the lambda calculus you *have* to construct functions
|
|
|
|
|
using *lambda abstraction*. In other words, there is consistent a way to convert
|
|
|
|
|
things defined with variables bindings into a representation (CCC morphisms)
|
|
|
|
|
where variables are *gone*.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
How interesting. Then, Conal goes on to explain that because Haskell is
|
|
|
|
|
"just" lambda calculus on steroids, any monomorphic function of type `a -> b`
|
|
|
|
|
really ought to be convertible into an arrow in the CCC of your choice.
|
|
|
|
|
And so he *did* just that. He is behind the [concat] GHC plugin and library.
|
|
|
|
|
This library exports a bunch of typeclasses that allow anyone to define instances
|
|
|
|
|
for their very own target CCC. Additionally, the plugin gives access to the
|
|
|
|
|
following, truly magical function:
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
[concat]: https://github.com/compiling-to-categories/concat
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
ccc :: CartesianClosed k => (a -> b) -> a `k` b
|
|
|
|
|
```
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
When the plugin is run during compilation, every time it encounters this specific
|
|
|
|
|
function it will convert the Haskell term (in GHC Core form) for the first
|
|
|
|
|
argument (a function) into the corresponding Haskell term for the morphism in
|
|
|
|
|
the target CCC.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-09 17:53:40 +00:00
|
|
|
|
How neat. A reliable way to overload the lambda notation in Haskell.
|
|
|
|
|
The paper is really, really worth a read, and contains many practical
|
|
|
|
|
applications such as compiling functions into circuits or automatic
|
|
|
|
|
differentiation.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
Another year goes through, without any solution in sight. And yet.
|
|
|
|
|
|
|
|
|
|
## "Compiling" to (symmetric) monoidal categories
|
|
|
|
|
|
|
|
|
|
A month ago, while browsing a Reddit thread on the sad state of `Arrow`,
|
|
|
|
|
I stumbled upon an innocent link buried in the depth of replies.
|
|
|
|
|
To a paper from Jean-Philippe Bernardy and Arnaud Spiwack:
|
|
|
|
|
["Evaluating Linear Functions to Symmetric Monoidal Categories"][smc].
|
|
|
|
|
|
|
|
|
|
[smc]: https://arxiv.org/abs/2103.06195v2
|
|
|
|
|
|
|
|
|
|
And boy oh boy, *what a paper*. I haven't been able to stop thinking about it
|
|
|
|
|
since then.
|
|
|
|
|
|
|
|
|
|
It starts with the following:
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
> A number of domain specific languages, such as circuits or
|
|
|
|
|
> data-science workflows, are best expressed as diagrams of
|
|
|
|
|
> boxes connected by wires.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
Well yes indeed, what I want to express in my syntax are just plain old
|
|
|
|
|
diagrams, made out of boxes and wires.
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
> A faithful abstraction is Symmetric Monoidal Categories
|
|
|
|
|
> (smcs), but, so far, it hasn’t been convenient to use.
|
|
|
|
|
|
|
|
|
|
Again yes, cannot agree more. This is the right abstraction, but a terrible way
|
|
|
|
|
to design these diagrams. But then, the kicker, a bit later in the paper:
|
|
|
|
|
|
|
|
|
|
> Indeed, every linear function can be interpreted in terms of an smc.
|
|
|
|
|
|
|
|
|
|
What. This, I had never heard. Indeed it makes sense, since in (non-cartesian)
|
|
|
|
|
monoidal categories you cannot duplicate objects
|
|
|
|
|
(that is, have morphisms from `a` to `(a, a)`),
|
|
|
|
|
to only reason about functions that can only use their arguments *once*, and
|
|
|
|
|
that *have* to use it (or pass it along by returning it). Note that here we talk
|
|
|
|
|
about *linear* functions in the sense of Linear Haskell, type theory kind of
|
|
|
|
|
"linear", not linear in the linear algebra kind of "linear".
|
|
|
|
|
|
|
|
|
|
So far so good. But then, they explain how to *evaluate* any such *linear*
|
|
|
|
|
Haskell function into the right SMC, **without metaprogramming**. And the
|
|
|
|
|
techniques they employ to do so are some of the smartest, most beautiful things
|
|
|
|
|
I've seen. I cannot recommend enough that you go read that paper to learn the
|
|
|
|
|
full detail. It's *amazing*, and perhaps more approachable than Conal's paper.
|
|
|
|
|
It is accompanied by the [linear-smc] libray, that exposes a very simple
|
|
|
|
|
interface:
|
|
|
|
|
|
|
|
|
|
[linear-smc]: https://hackage.haskell.org/package/linear-smc
|
|
|
|
|
|
|
|
|
|
- The module `Control.Category.Constrained` exports typeclasses to declare your
|
|
|
|
|
type family of choice `k :: * -> * -> *` (in the type-theory sense of type
|
|
|
|
|
family, not the Haskell sense) as the right kind of category, using `Category`,
|
|
|
|
|
`Monoidal` and `Cartesian`.
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
class Category k where
|
|
|
|
|
id :: a `k` a
|
|
|
|
|
(∘) :: (b `k` c) -> (a `k` c) -> a `k` c
|
|
|
|
|
|
|
|
|
|
class Category k => Monoidal k where
|
|
|
|
|
(×) :: (a `k` b) -> (c `k` d) -> (a ⊗ c) `k` (b ⊗ d)
|
|
|
|
|
swap :: (a ⊗ b) `k` (b ⊗ a)
|
|
|
|
|
assoc :: ((a ⊗ b) ⊗ c) `k` (a ⊗ (b ⊗ c))
|
|
|
|
|
assoc' :: (a ⊗ (b ⊗ c)) `k` ((a ⊗ b) ⊗ c)
|
|
|
|
|
unitor :: a `k` (a ⊗ ())
|
|
|
|
|
unitor' :: Obj k a => (a ⊗ ()) `k` a
|
|
|
|
|
|
|
|
|
|
class Monoidal k => Cartesian k where
|
|
|
|
|
exl :: (a ⊗ b) `k` a
|
|
|
|
|
exr :: (a ⊗ b) `k` b
|
|
|
|
|
dup :: a `k` (a ⊗ a)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
So far so good, nothing surprising, we can confirm that indeed
|
|
|
|
|
we've already defined (or can define) these operations for `Recipe m`,
|
|
|
|
|
thus forming a cartesian category.
|
|
|
|
|
|
|
|
|
|
- But the truly incredible bit comes from `Control.Category.Linear` that
|
|
|
|
|
provides the primitives to construct morphisms in a monoidal category using
|
|
|
|
|
linear functions.
|
|
|
|
|
|
|
|
|
|
- It exports an abstract type `P k r a` that is supposed to represent the
|
|
|
|
|
"output of an arrow/box in the SMC `k`, of type `a`.
|
|
|
|
|
- A function to convert a SMC arrow into a linear functions on *ports*.
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
encode :: (a `k` b) -> P k r a %1 -> P k r b
|
|
|
|
|
```
|
|
|
|
|
- A function to convert a linear function on *ports* to an arrow in your SMC:
|
|
|
|
|
|
|
|
|
|
```haskell
|
|
|
|
|
decode :: Monoidal k => (forall r. P k r a %1 -> P k r b) -> a `k` b
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
There are other primitives that we're gonna ignore here.
|
|
|
|
|
|
|
|
|
|
Now there are at least two things that are remarkable about this interface:
|
|
|
|
|
|
|
|
|
|
- By keeping the type of ports `P k r a` *abstract*, and making sure that the
|
|
|
|
|
exported functions to *produce* ports also take ports *as arguments*, they are able
|
|
|
|
|
to enforce that any linear function on ports written by the user **had to use
|
|
|
|
|
the operations of the library**.
|
|
|
|
|
|
|
|
|
|
There is virtually no other way to produce a port out of thin air than to use the
|
|
|
|
|
export `unit :: P k r ()`, and because the definition of `P k r a` is *not*
|
|
|
|
|
exported, users have *no way* to retrieve a value of type `a` from it.
|
|
|
|
|
Therefore, ports can only be *carried around*, and ultimately *given as input* to
|
|
|
|
|
arrows in the SMC, that have been converted into linear functions with `encode`.
|
|
|
|
|
|
|
|
|
|
I have since been told this is a fairly typical method used by DSL writers, to
|
|
|
|
|
ensure that end users only ever use the allowed operations and nothing more.
|
|
|
|
|
But it was a first for me, and truly some galaxy-brain technique.
|
|
|
|
|
|
|
|
|
|
- The second thing is this `r` parameter in `P k r a`. This type variable isn't
|
|
|
|
|
relevant to the information carried by the port. No, it's true purpose is
|
|
|
|
|
*ensuring that linear functions given to `decode` are **closed***.
|
|
|
|
|
|
|
|
|
|
Indeed, the previous point demonstrated that linear functions
|
|
|
|
|
`P k r a %1 -> P k r b` can only ever be defined in terms of *variables*
|
|
|
|
|
carrying ports, or linear functions on *ports*.
|
|
|
|
|
|
|
|
|
|
By quantifying over `r` in the first argument of `decode`, they prevent the
|
|
|
|
|
function to ever mention variables coming from *outside* the definition.
|
|
|
|
|
Indeed, all operations of the library use the same `r` for inputs and
|
|
|
|
|
outputs. So if an outsider port of type `Port k r a` was used in the definition
|
|
|
|
|
of a linear function, but defined *outside of it*, the function would *have*
|
|
|
|
|
to also use the same `r` for every port manipulated inside of it. Crucially,
|
|
|
|
|
this function can no longer be quantified over `r`, precisely because this `r`
|
|
|
|
|
was bound outside of its definition.
|
|
|
|
|
|
|
|
|
|
I have seen this technique once before, in `Control.Monad.ST.Safe`, and it's
|
|
|
|
|
so neat.
|
|
|
|
|
|
|
|
|
|
Because of the last two points, [linear-smc] ensures that the functions written by the
|
|
|
|
|
user given to `decode` can always be translated back into arrows, simply because they
|
|
|
|
|
must be *closed* and *only use the allowed operations*. Incorrect functions are
|
|
|
|
|
simply rejected by the type-checker with "readable" error messages.
|
|
|
|
|
|
|
|
|
|
Even though the library does the translation at runtime, **it cannot fail**.
|
|
|
|
|
|
|
|
|
|
[linear-smc] is readily available as a tiny, self-contained library on Hackage.
|
|
|
|
|
Because it doesn't do any metaprogramming, neither through Template Haskell nor
|
|
|
|
|
GHC plugins, it is very robust, easy to maintain and safe to depend on.
|
|
|
|
|
|
|
|
|
|
The only experimental feature being used is Linear Haskell, plus some constraint wizardry.
|
|
|
|
|
|
|
|
|
|
All in all, this seems like a wonderful foundation to stand on.
|
|
|
|
|
The library sadly doesn't have an associated Github page, and it seems like
|
|
|
|
|
nobody has heard about this paper and approach. At the time of writing,
|
|
|
|
|
this library has only been downloaded `125` times, and I'm responsible for a
|
|
|
|
|
large part of it. Please give it some love and look through the paper, you're
|
|
|
|
|
missing out.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
But now, let's look into how to apply this set of tools and go beyond.
|
|
|
|
|
|
|
|
|
|
## Reaching the destination: compiling to cartesian categories
|
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
---
|
2022-12-06 19:58:05 +00:00
|
|
|
|
|
2022-12-10 00:30:23 +00:00
|
|
|
|
And here is the end destination. We've finally been able to fully overload the
|
|
|
|
|
Haskell lambda abstraction syntax, yet are still able to track the use of
|
|
|
|
|
variables in order to keep generators incremental.
|
|
|
|
|
|
|
|
|
|
## Conclusion
|
|
|
|
|
|
|
|
|
|
If anyone has made it so far, I would like to thank you for reading through this
|
|
|
|
|
post in its entirety. I quite frankly have no clue whether this will be of use
|
|
|
|
|
to anyone, but I've been thinking about this for so long and was so happy to
|
|
|
|
|
reach a "simple" solution that I couldn't just keep it to myself.
|
|
|
|
|
|
|
|
|
|
Now again, I am very thankful for Bernardy and Spiwack's paper and library.
|
|
|
|
|
It is to my knowledge the cleanest way to do this kind of painless overloading.
|
|
|
|
|
It truly opened my eyes and allowed me to go a bit further. I hope the techniques
|
|
|
|
|
presented here can at least make a few people aware that these solutions exist
|
|
|
|
|
and can be used *painlessly*.
|
|
|
|
|
|
|
|
|
|
Now as for [achille], my pet project that was the motivation for this entire
|
|
|
|
|
thing, it has now reached the level of usefulness and friction that I was
|
|
|
|
|
--- only a few years ago --- merely dreaming of. Being the only user to date, I
|
|
|
|
|
am certainly biased, and would probably do a bad job convincing anyone that they
|
|
|
|
|
should use it, considering the amount of available tools.
|
|
|
|
|
|
|
|
|
|
However if you've been using [Hakyll] and are a tad frustrated by some of its
|
|
|
|
|
limitations --- as I was, I would be very happy if you could consider taking
|
|
|
|
|
[achille] for a spin. It's new, it's small, I don't know if it's as efficient as
|
|
|
|
|
it could be, but it is definitely made with love (and sweat).
|
|
|
|
|
|
|
|
|
|
### Future work
|
|
|
|
|
|
|
|
|
|
I now consider the syntax problem to be entirely solved. But there are always
|
|
|
|
|
more features that I wish for.
|
|
|
|
|
|
|
|
|
|
- I didn't implement **parallelism** yet, because it wasn't in the first version
|
|
|
|
|
of [achille] and thus not a priority. But as shown in this article, it should
|
|
|
|
|
*also* come for free. I first have to learn how Haskell does concurrency, then
|
|
|
|
|
just go and implement it.
|
|
|
|
|
|
|
|
|
|
- Make `Recipe m a b` into a GADT. Right now, the result of the translation from
|
|
|
|
|
functions on ports to recipes is non-inspectable, because I just get a Haskell
|
|
|
|
|
function. I think it would be very useful to make `Recipe m a b` a GADT,
|
|
|
|
|
where in addition to the current constructor, we have one for each primitive
|
|
|
|
|
operation (the operations of cartesian categories).
|
|
|
|
|
|
|
|
|
|
This should make it possible to **produce an SVG image of the diagram behind
|
|
|
|
|
every generator** made with [achille], which I find pretty fucking cool.
|
|
|
|
|
|
|
|
|
|
- In some rare cases, if the source code of the generator has been modified
|
|
|
|
|
between two runs, it can happen that a build rule receives as input cache the
|
|
|
|
|
old cache of a different recipe, that yet contains *exactly* the right kind of
|
|
|
|
|
information.
|
|
|
|
|
|
|
|
|
|
I haven't witnessed this often, but for now the only way to restore proper
|
|
|
|
|
incrementality is to clean the cache fully and rebuild. A bit drastic if your
|
|
|
|
|
site is big or you have computationally expensive recipes. Now that the
|
|
|
|
|
diagram is completely static (compared to the previous version using monads),
|
|
|
|
|
I think it *should* be possible to let users give *names* to specific recipes,
|
|
|
|
|
so that:
|
|
|
|
|
|
|
|
|
|
- If we want to force execution of a *specific* recipe, by ignoring its cache,
|
|
|
|
|
we can do so by simply giving the name in the CLI.
|
|
|
|
|
|
|
|
|
|
- The cache of named recipes is stored *separately* from this tree-like nesting
|
|
|
|
|
of caches, so that these recipes become insensitive to refactorings of the
|
|
|
|
|
generator source code.
|
|
|
|
|
|
|
|
|
|
I would even go as far as saying that this would be easy to implement, but
|
|
|
|
|
those are famous last words.
|
|
|
|
|
|
|
|
|
|
- Actually, we can go even further. Because the diagram is static, we can
|
|
|
|
|
compute a hash at every node of the diagram. Yes, a *merkle tree*.
|
|
|
|
|
Every core recipe must be given a different hash (hand-picked, by me or
|
|
|
|
|
implementors of other recipes). Then by convention every recipe appends its
|
|
|
|
|
own hash to its local cache. This should entirely solve the problem of running
|
|
|
|
|
recipes that have changed *from scratch*, and *only those*. If any of the
|
|
|
|
|
sub-recipe of an outer receipe *has changed*, then the hash won't match, and
|
|
|
|
|
therefore it *has* to run again.
|
|
|
|
|
|
|
|
|
|
At what point do we consider things over-engineered? I think I've been past
|
|
|
|
|
that point for a few years already.
|
|
|
|
|
|
|
|
|
|
Til next time!
|