516 lines
20 KiB
Markdown
516 lines
20 KiB
Markdown
---
|
|
title: Overloading the lambda abstraction in Haskell
|
|
date: 2022-12-10
|
|
draft: false
|
|
---
|
|
|
|
About two years ago, I started working on a little embedded domain-specific
|
|
language (EDSL) called [achille], using Haskell. Because this EDSL has its own
|
|
notion of *morphisms* `Recipe m a b` from `a` to `b`, I was looking for a way to
|
|
let users write such morphisms using regular Haskell functions and the syntax
|
|
for lambda abstraction, `\x -> ...`.
|
|
|
|
As recently as 5 days ago, I was still convinced that there was no way to do
|
|
such a thing without requiring a lot of engineering effort, either through GHC
|
|
compiler plugins or with Template Haskell --- two things I'd rather stay far
|
|
away from.
|
|
|
|
Indeed,
|
|
|
|
[arrow]: https://hackage.haskell.org/package/base-4.17.0.0/docs/Control-Arrow.html
|
|
[proc]: https://www.haskell.org/arrows/syntax.html
|
|
|
|
- [Haskell's `Arrow` abstraction][arrow] and the related [`proc` notation][proc] are
|
|
supposed to enable just that, but they don't work as intended. It's a
|
|
difficult thing to [implement properly](https://github.com/tomjaguarpaw/Arrows2/issues/1)
|
|
and [if proposals to fix it](https://github.com/lexi-lambda/ghc-proposals/blob/constraint-based-arrow-notation/proposals/0000-constraint-based-arrow-notation.md) have been discussed at large, nothing made it
|
|
upstream.
|
|
|
|
But even if the notation worked properly (and was convenient to use),
|
|
it is quite common to work with morphisms `k a b` that cannot possibly embed
|
|
*all* Haskell functions `a -> b`, preventing us to use this notation as we
|
|
would like, simply because we cannot define a suitable instance of `Arrow k`.
|
|
|
|
There are *solutions* to this, like the great [overloaded] package, that
|
|
fixes the `proc` notation and crucially makes it available to many categories
|
|
that are not `Arrow`s. This may very well be exactly what you --- the reader --- need to
|
|
resolve this problem.
|
|
|
|
However, I want to avoid making my library rely on compiler plugins at
|
|
all cost, and think the `proc` notation, while way better than raw `Arrow`
|
|
combinators, is still too verbose and restrictive. *I want to write lambdas*,
|
|
and *I want to apply my morphisms just like any other Haskell function*.
|
|
So no luck.
|
|
|
|
[overloaded]: https://hackage.haskell.org/package/overloaded-0.3.1
|
|
|
|
- There is also this wonderful paper from Conal Elliott: ["Compiling to Categories"][ccc].
|
|
In it, he recalls that any function in the simply typed lambda calculus corresponds
|
|
to an arrow in any cartesian closed category (CCC). What he demonstrated with the
|
|
[concat] GHC plugin is that this well-known result can be used to lift
|
|
any monomorphic Haskell function into a Haskell arrow of any user-defined Haskell CCC.
|
|
|
|
This is very cool, but also an experimental research project. It's very unstable and
|
|
unreliable. It's not packaged on Hackage so fairly difficult to install as a library,
|
|
and looks hard to maintain precisely because it's a compiler plugin.
|
|
|
|
And again, not every category is cartesian-closed, especially if exponentials
|
|
*have to be plain Haskell functions* --- which [concat] enforces.
|
|
|
|
[ccc]: http://conal.net/papers/compiling-to-categories/
|
|
[concat]: https://github.com/compiling-to-categories/concat
|
|
|
|
**However**, 5 days ago I stumbled upon some paper by sheer luck.
|
|
And well, I'm happy to report that **"overloading" the lambda abstraction is
|
|
*completely doable***. Not only that: it is actually *very easy* and
|
|
doesn't require any advanced Haskell feature, nor any kind of metaprogramming.
|
|
No. library. needed.
|
|
|
|
I cannot emphasize enough how powerful the approach appears to be:
|
|
|
|
- You can define your very own `proc` notation for *any* category you desire.
|
|
|
|
- That is, as soon as you can provide an instance for `Category k`,
|
|
you can *already* overload the lambda abstraction to write your morphisms.
|
|
Even if your category isn't an instance of `Arrow` because of this `arr` function.
|
|
*You do not have to be able to embed pure Haskell functions in your category*.
|
|
|
|
- The resulting syntax is way more intuitive than the `proc` notation,
|
|
simply because you can manipulate morphims `k a b` of your category as *plain old
|
|
Haskell functions*, and therefore compose them and apply them to variables
|
|
just as any other Haskell function, using the same syntax.
|
|
|
|
- Implementing the primitives for overloading the lambda abstraction is a matter of about
|
|
10 lines of Haskell.
|
|
|
|
It's in fact so simple that I suspect it must already be documented *somewhere*,
|
|
but for the life of me I couldn't find anything. So here you go. I think this
|
|
will be very useful for EDSL designers out here.
|
|
|
|
## A toy DSL for flow diagrams
|
|
|
|
So let's say our EDSL is supposed to encode flow diagrams, with boxes and wires.
|
|
Boxes have distinguished inputs and outputs, and wires flow from outputs of
|
|
boxes to inputs of other boxes.
|
|
|
|
We can encode *any* flow diagram using the following operations:
|
|
|
|
```haskell
|
|
{-# LANGUAGE GADTs #-}
|
|
|
|
data Flow a b where
|
|
-- just a wire
|
|
Id :: Flow a a
|
|
-- putting two boxes one after the other
|
|
Seq :: Flow a b -> Flow b c -> Flow a c
|
|
-- putting two boxes one next to the other
|
|
Par :: Flow a b -> Flow c d -> Flow (a, c) (b, d)
|
|
-- box that duplicates its input
|
|
Dup :: Flow a (a, a)
|
|
-- box that gets rid of its input
|
|
Void :: Flow a ()
|
|
-- box that projects on first input
|
|
Fst :: Flow (a, b) a
|
|
-- box that projects on second input
|
|
Snd :: Flow (a, b) b
|
|
-- finally, we embed any pure function into a box
|
|
Embed :: (a -> b) -> Flow a b
|
|
```
|
|
|
|
There are probably enough constructors here to claim it is a cartesian category,
|
|
but I didn't double check and it doesn't really matter here.
|
|
|
|
I think we can agree that although this may very well be the right abstraction to
|
|
internally transform and reason about diagrams, what an **awful**, **awful** way
|
|
to write them. Even if we provided a bunch of weird infix operators for some of the
|
|
constructors, only few Haskellers would be able to make sense of this
|
|
gibberish.
|
|
|
|
What *I* want is a way to write these diagrams down using lambda abstractions
|
|
and variable bindings, that get translated to this internal representation.
|
|
|
|
We *can* give an instance for both `Category Flow` and `Arrow Flow`:
|
|
|
|
```haskell
|
|
import Control.Category
|
|
import Control.Arrow
|
|
|
|
instance Category Flow where id = Id ; g . f = Seq f g
|
|
|
|
instance Arrow Flow where
|
|
arr = Embed
|
|
first f = Par f id
|
|
second = Par id
|
|
(***) = Par
|
|
f &&& g = Seq Dup (Par f g)
|
|
```
|
|
|
|
We can even give a custom implementation for every `Arrow` operation.
|
|
And yet, we're left with nothing more than disappointment when we attempt
|
|
to use the `proc` notation:
|
|
Haskell's desugarer for the `proc` notation is *really dumb*.
|
|
|
|
```haskell
|
|
{-# LANGUAGE Arrows #-}
|
|
|
|
t :: Flow (a, b) (b, a)
|
|
t = proc (x, y) -> returnA -< (y, x)
|
|
```
|
|
|
|
If you print the term, you get something like this:
|
|
|
|
```haskell
|
|
Seq (Embed(_)) (Seq (Embed(_)) (Embed(_)))
|
|
```
|
|
|
|
We just wrote `swap`. But Haskell *doesn't even try* to use the operations we've
|
|
just defined in `Arrow Flow`, and just lifts pure Haskell functions using `arr`.
|
|
The terms you get are **not inspectable**, they are black boxes. Not. good.
|
|
Granted, morphisms `Fst` and `Snd` are not part of the interface of `Arrow`,
|
|
so this example is a bit unfair. But if you've used the `proc`
|
|
notation at all, you know this isn't an isolated incident, and `arr` eventually
|
|
always shows up.
|
|
|
|
What I'm gonna show you is how to enable the following, *straight-to-the-point*
|
|
syntax:
|
|
|
|
```haskell
|
|
t :: Flow (a, b) (b, a)
|
|
t = flow \(Tup x y) -> Tup y x
|
|
```
|
|
|
|
That reduces to the following term:
|
|
|
|
```haskell
|
|
t = Seq Dup (Par (Seq Id Snd) (Seq Id Fst))
|
|
```
|
|
|
|
How beautiful! *Sure*, it's not the most *optimal* representation, and a traversal
|
|
over the term could simplify it by removing `Seq Id`, but at the very
|
|
least it's fully *inspectable*.
|
|
|
|
## The solution
|
|
|
|
Now my solution stems from this truly *amazing* paper from Jean-Philippe
|
|
Bernardy and Arnaud Spiwack:
|
|
[Evaluating Linear Functions to Symmetric Monoidal Categories][smc] (SMCs).
|
|
|
|
In this paper, they explain that if CCCs are models of the simply typed lambda
|
|
calculus, it is "well-known" that SMCs are models of the *linear* simply-typed
|
|
lambda calculus. And thus, they show how they are able to *evaluate* linear
|
|
functions (as in, Linear Haskell linear functions) into arrows in any
|
|
target SMC of your choice.
|
|
|
|
They even released a library for that: [linear-smc]. I implore you to go and
|
|
take a look at both the paper and the library, it's very *very* smart. Sadly, it
|
|
seems to have gone mostly unnoticed.
|
|
|
|
[smc]: https://arxiv.org/abs/2103.06195v2
|
|
[linear-smc]: https://hackage.haskell.org/package/linear-smc-1.0.1
|
|
|
|
*This* paper was the tipping point. Because my target category is
|
|
cartesian (ergo, I can duplicate values), I suspected that I could
|
|
remove almost all the complex machinery they had to employ to
|
|
go from a free cartesian category over a SMC to the underlying SMC.
|
|
I was hopeful that I could ditch Linear Haskell, too.
|
|
|
|
And, relieved as I am, I can tell you that yes: not only can all of this be
|
|
simplified (if you don't care about SMCs or Linear Haskell), but everything can
|
|
be implemented in a handful lines of Haskell code.
|
|
|
|
### Interface
|
|
|
|
So, the first thing we're gonna look at is the *interface* exposed to the users
|
|
of your EDSL. These are the magic primitives that we will have to implement.
|
|
|
|
```haskell
|
|
type Port r a
|
|
```
|
|
|
|
First there is this (abstract) type `Port r a`. It is meant to represent the
|
|
**output** of a box (in our flow diagrams) carrying information of type `a`.
|
|
|
|
Because the definition of `Port r a` is *not* exported, there is crucially *no
|
|
way* for the user to retrieve a value of type `a` from it. Therefore, to use a
|
|
"port variable" in any meaningful way, they can **only** use the operations
|
|
on ports that *you* --- the library designer --- export.
|
|
|
|
And now, the two other necessary primitives:
|
|
|
|
```haskell
|
|
encode :: Flow a b -> Port r a -> Port r b
|
|
decode :: (forall r. Port r a -> Port r b) -> Flow a b
|
|
```
|
|
|
|
- `encode` transforms a morphism from your category --- here a diagram `Flow a b` ---
|
|
into a Haskell functions *on ports*. By exporting this function, you enable
|
|
users to apply any morphism *in your category* to the "port variables" they
|
|
have at their disposal. And precisely *only* those operations.
|
|
|
|
- `decode` does the *reverse* translation. It takes any Haskell function *on
|
|
ports*, and converts it back into a morphism in your category. Well, not *any*
|
|
function. Only functions that are *closed* with respect to port variables.
|
|
That's why there is this type variable `r` in `Port r a`. Because all the
|
|
operations on ports exported by this interface share the same `r` between all
|
|
inputs and outputs, there is *no way* to use a port variable defined outside
|
|
of a function over ports, and still have the function be quantified over this `r`.
|
|
|
|
The same kind of trick, I believe, is used in [Control.Monad.ST][st].
|
|
This is really really neat.
|
|
|
|
[st]: https://hackage.haskell.org/package/base-4.17.0.0/docs/Control-Monad-ST.html
|
|
|
|
---
|
|
|
|
`encode` and `decode` can be defined for any `Category k`.
|
|
But if `k` is *also* cartesian, we can provide the following additional primitives:
|
|
|
|
```haskell
|
|
pair :: Port r a -> Port r b -> Port r (a, b)
|
|
unit :: Port r ()
|
|
```
|
|
|
|
And... that's all you need!
|
|
|
|
---
|
|
|
|
Now of course you're free to include in your API your own morphisms converted
|
|
into functions over ports. This would allow you to fully hide from the world
|
|
your internal representation of diagrams. And to do that, you only need to use
|
|
the previous primitives:
|
|
|
|
```haskell
|
|
flow = decode
|
|
|
|
fst :: Port r (a, b) -> Port r a
|
|
fst = encode Fst
|
|
|
|
snd :: Port r (a, b) -> Port r b
|
|
snd = encode Snd
|
|
|
|
split :: Port r (a, b) -> (Port r a, Port r b)
|
|
split p = (fst p, snd p)
|
|
|
|
pattern Tup x y <- (split -> (x, y))
|
|
where Tup x y = pair x y
|
|
|
|
void :: Port r a -> Port r ()
|
|
void = encode Void
|
|
|
|
box :: (a -> b) -> Port r a -> Port r b
|
|
box f = encode (Embed f)
|
|
|
|
(>>) :: Port r a -> Port r b -> Port r b
|
|
x >> y = snd (pair x y)
|
|
```
|
|
|
|
You can see I use `PatternSynonyms` and `ViewPatterns` to define this `Tup`
|
|
syntax that appeared earlier. Because I defined this `(>>)` operator to discard
|
|
its first port, we can use `QualifiedDo` to have an even nicer syntax for
|
|
putting all ports in parallel and discard all but the last one:
|
|
|
|
```haskell
|
|
box1 :: Port r Text -> Port r (Int, Int)
|
|
box2 :: Port r Text -> Port r Bool
|
|
box3 :: Port r Int -> Port r ()
|
|
box4 :: Port r Int -> Port r ()
|
|
|
|
diag :: Flow (Text, Text) Bool
|
|
diag = flow \(Tup x y) -> Flow.do
|
|
let Tup a b = box1 x
|
|
box3 a
|
|
box4 b
|
|
box2 y
|
|
```
|
|
|
|
Because `(>>)` is defined in such a way, `box1`, `box3` and `box4` *will appear*
|
|
in the resulting diagram even though their output gets discarded.
|
|
Were we to define `x >> y = y`, they would simply disappear altogether.
|
|
This technique gives a lot of control over how the translation should
|
|
be specified.
|
|
|
|
Finally and most importantly, `box` is the **only** way to introduce pure
|
|
Haskell functions `a -> b` into a black box in our diagram. For this reason, we
|
|
can be sure that `Embed` will never show up if `box` isn't used explicitely by
|
|
the user. If your category has no embedding of Haskell functions,
|
|
or does *not* export it, it's impossible to insert them, ever.
|
|
|
|
### Implementation
|
|
|
|
And now, the big reveal... `Ports r a` are just morphisms from `r`
|
|
to `a` in your category!
|
|
|
|
**This** is the implementation of the primitives from earlier.
|
|
|
|
```haskell
|
|
newtype Port r a = P { unPort:: Flow r a }
|
|
|
|
encode :: Flow a b -> Port r a -> Port r b
|
|
encode f (P x) = P (f . x)
|
|
|
|
decode :: (forall r. Port r a -> Port r b) -> Flow a b
|
|
decode f = unPort (f (P id))
|
|
|
|
pair :: Port r a -> Port r b -> Port r (a, b)
|
|
pair (P x) (P y) = P (x &&& y)
|
|
|
|
unit :: Port r ()
|
|
unit = P Void
|
|
```
|
|
|
|
And indeed, if you consider `Port r a` to represent paths from some input `r`
|
|
to output `a` in a diagram, then by squinting your eyes a bit the *meaning* (in
|
|
your category) of this interface makes sense.
|
|
|
|
- For `encode`, surely if you know how to go from `a` to `b`, and from `r` to
|
|
`a`, then you know how to go from `r` to `b`: that's just composition, hence
|
|
`(.)`.
|
|
|
|
- For `decode`, you have as assumption that from any point `r` in a diagram,
|
|
you know how to transform a path from `r` to `a` into a path from `r` to `b`.
|
|
In particular, because this holds for any `r`, this holds for `a`. And you
|
|
know a path from `a` to `a`, it's the identity! This gets you a path from `a`
|
|
to `b`.
|
|
|
|
It would appear `encode` and `decode` being inverse of one another is
|
|
yet another instance of the Yoneda lemma, but I'm not a categorician so
|
|
I will not attempt to explain this any more than that, apologies.
|
|
|
|
Still, the sheer simplicity of this technique is mesmerizing to me.
|
|
It kinda looks like doing CPS, but it *isn't* CPS. Quoting the paper:
|
|
|
|
> the encoding from `k a b` to `P k r a ⊸ P k r b` can be thought of as a
|
|
> transformation to continuation-passing-style (cps), albeit reversed --- perhaps a
|
|
> “prefix-passing-style” transformation.
|
|
|
|
Apparently it is *the dual of CPS* or something.
|
|
|
|
And you can double-check that in order to implement `encode` and `decode` you just
|
|
need `id` and `(.)`. Now to allow more stuff than just composing
|
|
morphisms as functions, more primitives can be added, like `pair` using `(&&&)`,
|
|
and `unit` using `Void :: Flow a ()` --- which may or may not be available,
|
|
dependending on the kind of categories you're working with.
|
|
|
|
---
|
|
|
|
### Some things to be wary of
|
|
|
|
Ok, so now we've seen how to successfully "overload" the lambda abstraction.
|
|
But there is one quirk that I think you should be aware of.
|
|
|
|
#### Preventing (too much) duplication in the diagram
|
|
|
|
If you look at the definition of `split p`, you can see that the input `p` gets
|
|
duplicated in the two output ports. If both of them are used in a given diagram, `p`
|
|
will appear at least *twice* in this diagram. Now in the abstract setting of
|
|
category theory, duplication in your diagram doesn't matter because of the
|
|
equivalences and laws that equate diagrams.
|
|
|
|
However, we're trying to encode *useful* things. My original motivation for
|
|
going to this deep end was modeling flow diagrams of *effectful computations*, for
|
|
the new version of [achille]. You could imagine changing the `Embed` constructor
|
|
of `Flow` to the following:
|
|
|
|
[achille]: /projects/achille
|
|
|
|
```haskell
|
|
Embed :: (a -> IO b) -> Flow a b
|
|
```
|
|
|
|
From then on, encoded flow diagrams are meant to be *executed* --- and even parallelised for
|
|
free, how neat --- but I hope you see now why we should **never ever** duplicate any
|
|
such `Embed` box anymore. In the paper, that's why they argue it's
|
|
important to restrict yourself to monoidal categories and linear functions,
|
|
and make `copy` an explicit operation in your syntax.
|
|
|
|
```haskell
|
|
copy :: Port r a %1 -> (Port r a, Port r a)
|
|
```
|
|
|
|
Indeed, the type system will prevent you to use a variable more than once
|
|
without explicitly inserting `copy`. However, everything has to be made linear
|
|
and this has huge implications on the user-friendliness of the overall syntax.
|
|
I've also found error messages from Linear Haskell to be pretty uninformative
|
|
--- and `let` bindings are *not* linear, **what??**.
|
|
|
|
Instead, I came up with a solution I'm fairly happy with. There is a trick: we
|
|
introduce a new primitive.
|
|
|
|
```haskell
|
|
(>>=) :: Port r a -> (Port r a -> Port r b) -> Port r b
|
|
```
|
|
|
|
Now of course, you could just implement it as `x >>= f = f x`, but that's
|
|
precisely what we want to *avoid*. No, instead, I force the evaluation of `x`,
|
|
and *then* evaluate `f` applied to a box that simply returns the pre-computed
|
|
value, *side-effect free*. So a box gets duplicated, sure, but this box does
|
|
*nothing*. What I am now noticing while writing this down, is that introducing
|
|
this `Bind` operator makes the diagram non-inspectable again, oops. I can still
|
|
run it of course, but this is a problem for future me.
|
|
|
|
Anyway, the benefit of this `(>>=)` operator is that it's very convenient with
|
|
`QualifiedDo`:
|
|
|
|
```haskell
|
|
diag :: Flow FilePath Bool
|
|
diag = flow \src -> Flow.do
|
|
Tup a b <- box1 src
|
|
box3 a
|
|
box4 (box2 b)
|
|
```
|
|
|
|
You just have to stop using `let` and you're good to go. What's *remarkable*
|
|
in the syntax is that these *effectful* boxes are used as plain old **functions**.
|
|
Therefore there is *no need* to clutter the syntax with `<$>`, `>>=`, `pure`,
|
|
`-<`, `returnA` and a bind for every single arrow like in the `proc` notation:
|
|
you just compose and apply them like regular functions --- even when they are
|
|
effectful.
|
|
|
|
#### Compile vs. Evaluate
|
|
|
|
Now the paper is called "Evaluating [...]" and says there is a runtime cost to
|
|
pay for the translation. Considering how simple *this* implementation is, and
|
|
how the target is fully first-order (well, without `(>>=)`), I wouldn't be
|
|
surprised if GHC is actually able to fully unfold and translate to category
|
|
morphisms at compile-time. But I'm not an expert Haskell developer by any means
|
|
and have no clue how one would go about checking this, so if anyone does, [please tell me][me]!
|
|
|
|
[me]: mailto:lucas@escot.me
|
|
|
|
#### Expressive power
|
|
|
|
I'm very curious to see which kind of categories you can "compile" Haskell
|
|
functions to using this technique. I didn't expect converting to cartesian
|
|
category's morphisms to be so straightforward, and I'm looking forward to see
|
|
whether there is any fundamental limitation preventing us to do the same for
|
|
CCCs.
|
|
|
|
#### Recursive morphisms and infinite structures
|
|
|
|
[Someone asked on Reddit][reddit] whether recursive morphisms can be
|
|
written out using this syntax, *wihtout* making `decode` loop forever.
|
|
I've thought about it for a bit, and I think precisely because this is not
|
|
*CPS* and we construct morphisms `k a b` rather than Haskell functions,
|
|
`decode` should not be a problem. It *should* produce recursive morphisms where
|
|
recursive occurences are guarded by constructors. Because of laziness, looping
|
|
is not a concern.
|
|
|
|
But I haven't tried this out yet. Since in [achille] recursive
|
|
*recipes* can be quite useful, I want to support them, and so I will *definitely*
|
|
investigate further.
|
|
|
|
[reddit]: https://www.reddit.com/r/haskell/comments/zi9mxp/comment/izqh96d/?utm_source=share&utm_medium=web2x&context=3
|
|
|
|
---
|
|
|
|
Thank you for reading! I hope this was or will be useful to some of you. I am
|
|
also very grateful for Bernardy and Spiwack's fascinating paper and library, it
|
|
quite literally made my week.
|
|
|
|
Feel free to go on [the r/haskell thread][thread] related to this post.
|
|
|
|
[thread]: https://www.reddit.com/r/haskell/comments/zi9mxp/overloading_the_lambda_abstraction_in_haskell/
|
|
|
|
Till next time, perhaps when I manage to release the next version of [achille],
|
|
using this very trick that I had been desperately looking for in the past 2 years.
|