more work on article, wishful thinking

2022-12-09 18:53:40 +01:00 · 2022-12-09 18:53:40 +01:00 · 4dd1130758
parent f8b1c385f0
commit 4dd1130758
1 changed files with 180 additions and 81 deletions
--- a/content/posts/achille-smc.md
+++ b/content/posts/achille-smc.md
@ -19,55 +19,66 @@ import Achille as A

 main :: IO ()
 main = achille $ task A.do
+  -- copy every static asset as is
+  match_ "assets/*" copyFile
+
+  -- load site template
+  template <- matchFile "template.html" loadTemplate
+
  -- render every article in `posts/`
  -- and gather all metadata
  posts <-
    match "posts/*.md" \src -> A.do
      (meta, content) <- processPandocMeta src
-      writeFile (src -<.> ".html") (renderPost meta content)
+      writeFile (src -<.> ".html") (renderPost template meta content)
      meta

  -- render index page with the 10 most recent articles
-  renderIndex (take 10 (sort posts))
+  renderIndex template (take 10 (sort posts))
 ```


 Importantly, I want to emphasize that *you* --- the library user --- neither
 have to care about or understand the internals of [achille] in order to use it.
-You are free to ignore this post and directly go through the [user
-manual][manual] to get started!
+*Most* of the machinery below is purposefully kept hidden from plain sight. You
+are free to ignore this post and directly go through the [user manual][manual]
+to get started!

 [manual]: /projects/achille/

-This post is just there to document how the right theoretical framework was key
-in providing a good user interface that preserves all the desired properties.
+This article is just there to document how the right theoretical framework was
+instrumental in providing a good user interface *and yet* preserve all the
+desired properties. It also gives pointers on how to reliably overload Haskell's
+*lambda abstraction* syntax, because I'm sure many applications could make good
+use of that but are unaware that there are now ways to do it properly, *without
+any kind of metaprogramming*.

 ---

 ## Foreword

-The original postulate is that *static sites are good*. Of course not for every
-use case, but for single-user, small-scale websites, it is a very practical way
-of managing content. Very easy to edit offline, very easy to deploy. All in all
+My postulate is that *static sites are good*. Of course not for every
+use case, but for single-user, small-scale websites, it is a very convenient way
+to manage content. Very easy to edit offline, very easy to deploy. All in all
 very nice.

 There are lots of static site generators readily available. However each and
-every one of them has a very specific idea of how you *should* manage your
-content. For simple websites --- i.e weblogs --- they are great, but as soon as
-you want to heavily customize the building process of your site, require more
-fancy transformations, and thus step outside of the supported feature set of
-your site generator of choice, you're in for a lot of trouble.
+every one of them has a very specific idea of how you should *structure* your
+content. For simple websites --- i.e weblogs --- they are wonderful, but as soon
+as you want to heavily customize the generation process of your site or require
+more fancy transformations, and thus step outside of the supported feature set
+of your generator of choice, you're out of luck.

 For this reason, many people end up not using existing static site generators,
 and instead prefer to write their own. Depending on the language you use, it is
-fairly straightforward to write a little static site generator doing everything
-you want. Sadly, making it *incremental* or *parallel* is another issue, and way
-trickier.
+fairly straightforward to write a little static site generator that does
+precisely what you want. Sadly, making it *incremental* or *parallel* is another
+issue, and way trickier.

-That's precisely the niche that [Hakyll] and
-[achille] try to fill: use an embedded DSL in Haskell to specify your *custom* build
-rules, and compile them all into a full-fletched **incremental** static site
-generator executable. Some kind of static site generator *generator*.
+That's precisely the niche that [Hakyll] and [achille] try to fill: provide an
+embedded DSL in Haskell to specify your *custom* build rules, and compile them
+all into a full-fletched **incremental** static site generator executable. Some
+kind of static site generator *generator*.

 [Hakyll]: https://jaspervdj.be/hakyll/

@ -78,13 +89,13 @@ is with a flow diagram, where *boxes* are "build rules". Boxes have
 distinguished inputs and outputs, and dependencies between the build rules are
 represented by wires going from outputs of boxes to inputs of other boxes.

-The static site generator corresponding to the Haskell code above could be
-represented as the following diagram:
+The static site generator corresponding to the Haskell code above corresponds
+to the following diagram:

 ...

 Build rules are clearly identified, and we see that in order to render the `index.html`
-page, we need to wait for the `renderPosts` rule to finish rendering each
+page, *we need to wait* for the `renderPosts` rule to finish rendering each
 article to HTML and return the metadata of every one of them.

 Notice how some wires are **continuous** **black** lines, and some other wires are
@ -96,9 +107,10 @@ generator.
 - files that are written to the filesystem, like the HTML output of every
    article, or the `index.html` file.

-The first insight is to realize that the build system *shouldn't care about side
-effects*. Its *only* role is to know whether build rules *should be executed*,
-and how intermediate values get passed around.
+The first important insight is to realize that the build system *shouldn't care
+about side effects*. Its *only* role is to know whether build rules *should be
+executed*, how intermediate values get passed around, and how they change
+between consecutive runs.

 ### The `Recipe m` abstraction

@ -117,37 +129,6 @@ The purpose is to *abstract over side effects* of build rules (such as producing
 HTML files on disk) and shift the attention to *intermediate values* that flow
 between build rules.

-As one could expect, if `m` is a monad, so is `Recipe m a`. This means composing
-recipes is very easy and dependencies *between* those are stated **explicitely**
-in the code.
-
-```haskell
-main :: IO ()
-main = achille do
-  posts <- match "posts/*.md" compilePost
-  compileIndex posts
-```
-
-``` {=html}
-<details>
-  <summary>Type signatures</summary>
-```
-Simplifying a bit, these would be the type signatures of the building blocks in
-the code above.
-```haskell
-compilePost  :: Recipe IO FilePath PostMeta
-match        :: GlobPattern -> (Recipe IO FilePath b) -> Recipe IO () [b]
-compileIndex :: PostMeta -> Recipe IO () ()
-achille      :: Recipe IO () () -> IO ()
-```
-``` {=html}
-</details>
-```
-
-There are no ambiguities about the ordering of build rules and the evaluation model
-is in turn *very* simple --- in contrast to Hakyll, its global store and
-implicit ordering.
-
 ### Caching

 In the definition of `Recipe`, a recipe takes some `Cache` as input, and
@ -185,12 +166,105 @@ become insensitive to code refactorings, but the core idea is left unchanged.

 ### But there is a but

-## Arrows
+We've now defined all the operations we could wish for in order to build,
+compose and combine recipes. We've even found the theoretical framework our
+concrete application inserts itself into. How cool!

-I really like the `do` notation, but sadly losing this information about
-variable use is bad, so no luck. If only there was a way to *overload* the
-lambda abstraction syntax of Haskell to transform it into a representation free
-of variable bindings...
+**But there is catch**, and I hope you've already been thinking about it:  
+**what an awful, awful way to write recipes**.
+
+Sure, it's nice to know that we have all the primitive operations required to
+express all the flow diagrams we could ever be interested in. We *can*
+definitely define the site generator that has been serving as example
+throughout:
+
+```
+rules :: Task ()
+rules = renderIndex ∘ (...)
+```
+
+But I hope we can all agree on the fact that this code is **complete
+gibberish**. It's likely *some* Haskellers would be perfectly happy with this
+interface, but alas my library isn't *only* targeted to this crowd. No, what I
+really want is a way to assign intermediate results --- outputs of rules --- to
+*variables*, that then get used as inputs. Plain old Haskell variables. That is,
+I want to write my recipes as plain old *functions*.
+
+And here is where my --- intermittent --- search for a readable syntax started,
+roughly two years ago.
+
+## The quest for a friendly syntax
+
+### Monads
+
+If you've done a bit of Haskell, you *may* know that as soon as you're working
+with things that compose and sequence, there are high chances that what you're
+working with are *monads*. Perhaps the most well-known example is the `IO`
+monad. A value of type `IO a` represents a computation that, after doing
+side-effects (reading a file, writing a file, ...) will produce a value of type
+`a`.
+
+Crucially, being a monad means you have a way to *sequence* computations. In
+the case of the `IO` monad, the bind operation has the following type:
+
+```haskell
+(>>=) :: IO a -> (a -> IO b) -> IO b
+```
+
+And because monads are so prevalent in Haskell, there is a *custom syntax*, the
+`do` notation, that allows you to bind results of computations to *variables*
+that can be used for the following computations. This syntax gets desugared into
+the primitive operations `(>>=)` and `pure`.
+
+```haskell
+main :: IO ()
+main = do
+  content <- readFile "input.txt"
+  writeFile "output.txt" content
+```
+
+The above gets transformed into:
+
+```haskell
+main :: IO ()
+main = readFile "input.txt" >>= writeFile "output.txt"
+```
+
+Looks promising, right? I can define a `Monad` instance for `Recipe m a`,
+fairly easily.
+
+```haskell
+instance Monad (Recipe m a) where
+  (>>=) :: Recipe m a b -> (b -> Recipe m a c) -> Recipe m a c
+```
+
+And now problem solved?
+
+```haskell
+rules :: Task IO ()
+rules = do
+  posts <- match "posts/*.md" renderPosts
+  renderIndex posts
+```
+
+The answer is a resolute **no**. The problem becomes apparent when we try to
+actually define this `(>>=)` operation.
+
+1. The second argument is a Haskell function of type `b -> Recipe m a c`. And
+   precisely because it is a Haskell function, it can do anything it wants
+   depending on the value of its argument. In particular, it could very well
+   return *different recipes* for *different inputs*. That is, the *structure*
+   of the graph is no longer *static*, and could change between runs, if the
+   output of type `b` from the first rule happens to change. This is **very
+   bad**, because we rely on the static structure of recipes to make the claim
+   that the cache stays consistent between runs.
+
+Ok, sure, but what if we assume that users don't do bad things (we never should).
+No, even then, there is an ever bigger problem:
+
+2. Because the second argument is *just a Haskell function*.
+
+## Arrows

 That's when I discovered Haskell's arrows. It's a generalization of monads,
 and is often presented as a way to compose things that behave like functions.
@ -212,31 +286,56 @@ how Haskell desugars the arrow notation.

 ...

-There is a macro that is a bit smarter than current Haskell's desugarer, but not
-by much. I've seen some discussions about actually fixing this upstream, but I
-don't think anyone actually has the time to do this. So few people use arrows to
-justify the cost.
+So. Haskell's `Arrow` isn't it either. Well, in principle it *should* be the
+solution. But the desugarer is broken, the syntax still unreadable to my taste,
+and nobody has the will to fix it.

+This syntax investigation must carry on.

-## Conal Elliott's `concat`
+## Compiling to cartesian closed categories

-Conal Elliott wrote a fascinating paper called *Compiling to Categories*.
-The gist of it is that any cartesian-closed category is a model of simply-typed
-lambda-calculus. Therefore, he made a GHC plugin giving access to a magical
-function:
+About a year after this project started, and well after I had given up on this
+whole endeavour, I happened to pass by Conal Elliott's fascinating paper
+["Compiling to Categories"][ccc]. In this paper, Conal recalls:

-```
-ccc :: Closed k => (a -> b) -> a `k` b
+[ccc]: http://conal.net/papers/compiling-to-categories/
+
+> It is well-known that the simply typed lambda-calculus is modeled by any
+> cartesian closed category (CCC)
+
+I had heard of it, that is true. What this means is that, given any cartesian
+closed category, any *term* of type `a -> b` (a function) in the simply-typed
+lambda calculus corresponds to (can be interpreted as) an *arrow* (morphism) 
+`a -> b` in the category. But a cartesian-closed category crucially has no notion
+of *variables*, just some *arrows* and operations to compose and rearrange them
+(among other things). Yet in the lambda calculus you *have* to construct functions
+using *lambda abstraction*. In other words, there is consistent a way to convert
+things defined with variables bindings into a representation (CCC morphisms)
+where variables are *gone*.
+
+How interesting. Then, Conal goes on to explain that because Haskell is
+"just" lambda calculus on steroids, any monomorphic function of type `a -> b`
+really ought to be convertible into an arrow in the CCC of your choice.
+And so he *did* just that. He is behind the [concat] GHC plugin and library.
+This library exports a bunch of typeclasses that allow anyone to define instances
+for their very own target CCC. Additionally, the plugin gives access to the
+following, truly magical function:
+
+[concat]: https://github.com/compiling-to-categories/concat
+
+```haskell
+ccc :: CartesianClosed k => (a -> b) -> a `k` b
 ```

-You can see that the signature is *very* similar to the one of `arr`.
-
-A first issue is that `Recipe m` very much isn't *closed*. Another more
-substantial issue is that the GHC plugin is *very* experimental. I had a hard
-time running it on simple examples, it is barely documented.
-
-Does this mean all hope is lost? **NO**.
+When the plugin is run during compilation, every time it encounters this specific
+function it will convert the Haskell term (in GHC Core form) for the first
+argument (a function) into the corresponding Haskell term for the morphism in
+the target CCC.

+How neat. A reliable way to overload the lambda notation in Haskell. 
+The paper is really, really worth a read, and contains many practical
+applications such as compiling functions into circuits or automatic
+differentiation.

 ## Compiling to monoidal cartesian categories