new post on overloading lambda
This commit is contained in:
parent
6c0d4ad54c
commit
8d14f78c2c
|
@ -0,0 +1,445 @@
|
|||
---
|
||||
title: Overloading the lambda abstraction in Haskell
|
||||
date: 2022-12-10
|
||||
draft: false
|
||||
---
|
||||
|
||||
I've been dreaming about overloading the lambda abstraction in Haskell for a
|
||||
few years already. I am working on an EDSL with its very own notion of
|
||||
morphisms `k a b` from `a` to `b`. Thus I'm looking for a syntax that enables
|
||||
users to write said morphisms using just regular Haskell functions and
|
||||
the `\x -> ...` notation.
|
||||
|
||||
Until 5 days ago I thought it was impossible.
|
||||
|
||||
- Haskell's `Arrow` abstraction and associated syntactic sugar are supposed to
|
||||
do just that, but it just doesn't work as intended. It's a
|
||||
difficult thing to [implement properly](https://github.com/tomjaguarpaw/Arrows2/issues/1)
|
||||
and [if proposals to fix it](https://github.com/lexi-lambda/ghc-proposals/blob/constraint-based-arrow-notation/proposals/0000-constraint-based-arrow-notation.md) have been come up, nothing concrete
|
||||
came out of it (yet).
|
||||
|
||||
But even if the notation worked properly (and was convenient to use),
|
||||
many times your internal representation of morphisms `k a b` cannot possibly embed
|
||||
*all* Haskell functions `a -> b`. Therefore it's impossible to declare an
|
||||
instance of `Arrow k`.
|
||||
|
||||
- Now there is indeed this wonderful paper from Conal Elliott: ["Compiling to Categories"][ccc].
|
||||
In it, he recalls that any function in the simply typed lambda calculus corresponds
|
||||
to an arrow in any cartesian closed category (CCC). What he demonstrated with the
|
||||
[concat] GHC plugin is that this well-known result can be used to lift
|
||||
any monomorphic Haskell function into a Haskell arrow of any user-defined Haskell CCC.
|
||||
|
||||
This is very cool, but also an experimental research project. It's very unstable and
|
||||
unreliable. It's not packaged on Hackage so fairly difficult to install as a library,
|
||||
and looks hard to maintain precisely because it's a compiler plugin.
|
||||
|
||||
So probably not a sound dependency to carry around when trying to develop a
|
||||
[little, self-contained, portable library yourself](https://github.com/flupe/achille).
|
||||
|
||||
And again, not every category is cartesian-closed, especially if exponentials
|
||||
*have to be plain Haskell functions* --- which [concat] enforces.
|
||||
|
||||
[ccc]: http://conal.net/papers/compiling-to-categories/
|
||||
[concat]: https://github.com/compiling-to-categories/concat
|
||||
|
||||
**However** --- I'm happy to report that **"overloading" the lambda abstraction is
|
||||
*completely doable***. Not only that: it is actually *very easy* and
|
||||
doesn't require any advanced Haskell feature, nor any kind of metaprogramming.
|
||||
No. library. needed. It really is the *perfect* solution.
|
||||
|
||||
I cannot emphasize enough how powerful the approach presented here looks to me:
|
||||
|
||||
- You can define your very own `proc` notation for the category of your choice.
|
||||
|
||||
- Your category `k a b` only has to define operations `(&&&)`, `(.)`, and `id`.
|
||||
In particular, you can define this notation even for things that cannot
|
||||
provide an `Arrow` instance because of the `arr` function.
|
||||
*You don't have to be able to embed pure Haskell functions in your category*.
|
||||
|
||||
- And finally, this syntax is way more usable than the `proc` notation,
|
||||
because you can manipulate morphims `k a b` of your category as *plain old
|
||||
Haskell functions*, and can therefore compose them and apply them to variables
|
||||
just as any other regular Haskell function, with the same syntax.
|
||||
|
||||
It's in fact so simple that I suspect it must already be documented *somewhere*,
|
||||
but for the life of me I couldn't find anything. So here you go.
|
||||
|
||||
## A toy DSL for flow diagrams
|
||||
|
||||
So let's say our EDSL is supposed to encode flow diagrams, with boxes and wires.
|
||||
Boxes have distinguished inputs and outputs, and wires flow from outputs of
|
||||
boxes to inputs of other boxes.
|
||||
|
||||
We can encode *any* flow diagram using the following operations:
|
||||
|
||||
```haskell
|
||||
{-# LANGUAGE GADTs #-}
|
||||
|
||||
data Flow a b where
|
||||
-- just a wire
|
||||
Id :: Flow a a
|
||||
-- putting two boxes one after the other
|
||||
Seq :: Flow a b -> Flow b c -> Flow a c
|
||||
-- putting two boxes one next to the other
|
||||
Par :: Flow a b -> Flow c d -> Flow (a, c) (b, d)
|
||||
-- box that duplicates its input
|
||||
Dup :: Flow a (a, a)
|
||||
-- box that gets rid of its input
|
||||
Void :: Flow a ()
|
||||
-- box that projects on first input
|
||||
Fst :: Flow (a, b) a
|
||||
-- box that projects on second input
|
||||
Snd :: Flow (a, b) b
|
||||
-- finally, we embed any pure function into a box
|
||||
Embed :: (a -> b) -> Flow a b
|
||||
```
|
||||
|
||||
I think there are enough constructors here to claim it a cartesian category, but
|
||||
I didn't double check and it doesn't really matter.
|
||||
|
||||
Now I think we can agree that although this is the right abstraction to
|
||||
internally transform and reason about diagrams, what an **awful**, **awful** way
|
||||
to write them. Even if we provided a bunch of weird infix operators for some of the
|
||||
constructors, only a few Haskellers would be able to make sense of this
|
||||
gibberish.
|
||||
|
||||
No, what matters is that we *can* give an instance for both `Category Flow` and `Arrow Flow`:
|
||||
|
||||
```haskell
|
||||
import Control.Category
|
||||
import Control.Arrow
|
||||
|
||||
instance Category Flow where id = Id ; g . f = Seq f g
|
||||
|
||||
instance Arrow Flow where
|
||||
arr = Embed
|
||||
first f = Par f id
|
||||
second = Par id
|
||||
(***) = Par
|
||||
f &&& g = Seq Dup (Par f g)
|
||||
```
|
||||
|
||||
We can even give a custom implementation for every `Arrow` operation.
|
||||
And yet, disappointment ensues when we attempt to use the `proc` notation:
|
||||
Haskell's desugarer for the `Arrow` syntactic sugar is *really dumb*.
|
||||
|
||||
```haskell
|
||||
{-# LANGUAGE Arrows #-}
|
||||
|
||||
t :: Flow (a, b) (b, a)
|
||||
t = proc (x, y) -> returnA -< (y, x)
|
||||
```
|
||||
|
||||
If you print the term, you get something like this:
|
||||
|
||||
```haskell
|
||||
Seq (Embed(_)) (Seq (Embed(_)) (Embed(_)))
|
||||
```
|
||||
|
||||
We just wrote `swap`. But Haskell *doesn't even try* to use the operations we've
|
||||
just defined in `Arrow Flow`, and just lifts pure Haskell functions using `arr`.
|
||||
The terms you get are **not inspectable**, they are black boxes. Not. good.
|
||||
|
||||
What I'm gonna show you is how to get the following, *amazing* syntax:
|
||||
|
||||
```haskell
|
||||
t :: Flow (a, b) (b, a)
|
||||
t = flow \(Tup x y) -> Tup y x
|
||||
```
|
||||
|
||||
That reduces to the following term:
|
||||
|
||||
```haskell
|
||||
t = Seq Dup (Par (Seq Id Snd) (Seq Id Fst))
|
||||
```
|
||||
|
||||
How beautiful! *Sure*, it's not the most *optimal* representation, and a traversal
|
||||
over the term could simplify it by removing `Seq Id`, but at the very
|
||||
least it's *inspectable*.
|
||||
|
||||
## The solution
|
||||
|
||||
Now my solution stems from this truly *amazing* paper from Jean-Philippe
|
||||
Bernardy and Arnaud Spiwack:
|
||||
[Evaluating Linear Functions to Symmetric Monoidal Categories][smc] (SMCs).
|
||||
|
||||
In this paper, they explain that if CCCs are models of the simply typed lambda
|
||||
calculus, it is "well-known" that SMCs are models of the *linear* simply-typed
|
||||
lambda calculus. And thus, they explain how they are able to *evaluate* linear
|
||||
Haskell functions (as in, Linear Haskell linear functions) into arrows in any
|
||||
target SMC.
|
||||
|
||||
They even released a library for that: [linear-smc]. I implore you to go and
|
||||
take a look at both the paper and the library, it's very *very* smart. Sadly, it
|
||||
seems to have gone mostly unnoticed.
|
||||
|
||||
[smc]: https://arxiv.org/abs/2103.06195v2
|
||||
[linear-smc]: https://hackage.haskell.org/package/linear-smc-1.0.1
|
||||
|
||||
I found out about this paper 5 days ago. Because my target category is
|
||||
actually cartesian (ergo, I can duplicate values), I suspected that I could
|
||||
remove almost all the complex machinery they had to employ to
|
||||
go from a free cartesian category to a monoidal one. I was hopeful that I
|
||||
could do without Linear Haskell, too.
|
||||
|
||||
And, relieved as I am, I can tell you that yes: not only can all of this be
|
||||
simplified (if you don't care about SMCs), but everything can be implemented in
|
||||
a handful lines of Haskell code.
|
||||
|
||||
### Interface
|
||||
|
||||
So, the first thing we're gonna look at is the *interface* exposed to the users
|
||||
of your EDSL. These are the magic primitives that we will have to implement later.
|
||||
|
||||
```haskell
|
||||
type Port r a
|
||||
```
|
||||
|
||||
First there is this (abstract) type `Port r a`. It is meant to represent the
|
||||
**output** of a box (in our flow diagrams) carrying information of type `a`.
|
||||
|
||||
Because the definition of `Port r a` is *not* exported, there is crucially *no
|
||||
way* for the user to retrieve a value of type `a` from it. Therefore, to use this
|
||||
variable in any meaningful way, they can **only** use the operations
|
||||
on ports that *you* --- the library designer --- export.
|
||||
|
||||
And now, the two other necessary primitives:
|
||||
|
||||
```haskell
|
||||
encode :: Flow a b -> Port r a -> Port r b
|
||||
decode :: (forall r. Port r a -> Port r b) -> Flow a b
|
||||
```
|
||||
|
||||
- `encode` transforms a morphism from your category --- here a diagram `Flow a b` ---
|
||||
into a Haskell functions *on ports*. By exporting this function, you enable
|
||||
users to apply any morphism *in your category* to the "port variables" they
|
||||
have at their disposal. And precisely *only* those operations.
|
||||
|
||||
- `decode` does the *reverse* translation. It takes any Haskell function *on
|
||||
ports*, and converts it back into a morphism in your category. Well, not *any*
|
||||
function. Only functions that are *closed* with respect to port variables.
|
||||
That's why there is this type variable `r` in `Port r a`. Because all the
|
||||
operations on ports exported by this interface share the same `r` between all
|
||||
inputs and outputs, there is *no way* to use a port variable defined outside
|
||||
of a function over ports, and still have the function be quantified over this `r`.
|
||||
|
||||
The same kind of trick, I believe, is used in [Control.Monad.ST][st]
|
||||
|
||||
[st]: https://hackage.haskell.org/package/base-4.17.0.0/docs/Control-Monad-ST.html
|
||||
|
||||
This is really really neat.
|
||||
|
||||
And finally, because inputs of your diagrams are nested tuples, it's important
|
||||
to provide ways to combine ports to construct new ports carrying tuples.
|
||||
|
||||
```haskell
|
||||
pair :: Port r a -> Port r b -> Port r (a, b)
|
||||
unit :: Port r ()
|
||||
```
|
||||
|
||||
And... that's all you need! Really? Really!
|
||||
|
||||
---
|
||||
|
||||
Now of course you're free to include in your API your own morphisms converted
|
||||
into functions over ports. This would allow you to fully hide from sight your
|
||||
internal representation of diagrams. And to do so, you only need to use the
|
||||
previous primitives:
|
||||
|
||||
```haskell
|
||||
flow = decode
|
||||
|
||||
fst :: Port r (a, b) -> Port r a
|
||||
fst = encode Fst
|
||||
|
||||
snd :: Port r (a, b) -> Port r b
|
||||
snd = encode Snd
|
||||
|
||||
split :: Port r (a, b) -> (Port r a, Port r b)
|
||||
split p = (fst p, snd p)
|
||||
|
||||
pattern Tup x y <- (split -> (x, y))
|
||||
where Tup x y = pair x y
|
||||
|
||||
void :: Port r a -> Port r ()
|
||||
void = encode Void
|
||||
|
||||
box :: (a -> b) -> Port r a -> Port r b
|
||||
box f = encode (Embed f)
|
||||
|
||||
(>>) :: Port r a -> Port r b -> Port r b
|
||||
x >> y = snd (pair x y)
|
||||
```
|
||||
|
||||
You can see we use `PatternSynonyms` and `ViewPatterns` to define this `Tup`
|
||||
syntax that appeared earlier. Because we defined this `(>>)` operator to discard
|
||||
its first port, we can use `QualifiedDo` to have an even nicer syntax for
|
||||
sequencing ports:
|
||||
|
||||
```haskell
|
||||
box1 :: Port r Text -> Port r (Int, Int)
|
||||
box2 :: Port r Text -> Port r Bool
|
||||
box3 :: Port r Int -> Port r ()
|
||||
box4 :: Port r Int -> Port r ()
|
||||
|
||||
diag :: Flow (Text, Text) Bool
|
||||
diag = flow \(Tup x y) -> Flow.do
|
||||
let Tup a b = box1 x
|
||||
box3 a
|
||||
box4 b
|
||||
box2 y
|
||||
```
|
||||
|
||||
Because we defined `(>>)` in such a way, `box1`, `box3` and `box4` *will appear*
|
||||
in the resulting diagram even though their output gets discarded.
|
||||
Were we to define `x >> y = y`, they would simply disappear altogether.
|
||||
This technique gives a lot of control over how the translation should
|
||||
be specified.
|
||||
|
||||
Finally and most importantly, `box` is the **only** way to introduce pure
|
||||
Haskell functions `a -> b` into a black box in our diagram. For this reason, we
|
||||
can be sure that `Embed` will never show up if `box` isn't used explicitely by
|
||||
the user.
|
||||
|
||||
### Implementation
|
||||
|
||||
And now, the reveal: `Ports r a` are just morphisms from `r`
|
||||
to `a` in your category! *This* is the implementation of the primitives from
|
||||
earlier.
|
||||
|
||||
```haskell
|
||||
newtype Port r a = P { unPort:: Flow r a }
|
||||
|
||||
encode :: Flow a b -> Port r a -> Port r b
|
||||
encode f (P x) = P (f . x)
|
||||
|
||||
decode :: (forall r. Port r a -> Port r b) -> Flow a b
|
||||
decode f = unPort (f (P id))
|
||||
|
||||
pair :: Port r a -> Port r b -> Port r (a, b)
|
||||
pair (P x) (P y) = P (x &&& y)
|
||||
|
||||
unit :: Port r ()
|
||||
unit = P Void
|
||||
```
|
||||
|
||||
And indeed, if you consider `Port r a` to represent paths from some input `r`
|
||||
to output `a` in a diagram, then by squinting your eyes a bit the *meaning* (in
|
||||
your category) of this interface makes sense.
|
||||
|
||||
- For `encode`, surely if you now how to go from `a` to `b`, and from `r` to
|
||||
`a`, then you know how to go from `r` to `b`: that's just composition, hence
|
||||
`(.)`.
|
||||
|
||||
- For `decode`, you have as assumption that from any point `r` in a diagram,
|
||||
you know how to transform a path from `r` to `a` into a path from `r` to `b`.
|
||||
In particular, because this holds for any `r`, this holds for `a`. And you
|
||||
know a path from `a` to `a`, it's the identity! This gets you a path from `a`
|
||||
to `b`.
|
||||
|
||||
It would appear `encode` and `decode` are just another instance of the Yoneda
|
||||
lemma, but I'm not a categorician so I cannot give any further explanation, my
|
||||
apologies.
|
||||
|
||||
Still, the sheer simplicity of this technique is mesmerizing to me.
|
||||
|
||||
---
|
||||
|
||||
### Some things to be wary of
|
||||
|
||||
Ok, so now we've seen how to successfully "overload" the lambda abstraction.
|
||||
But there is one quirk that I think you should be aware of.
|
||||
|
||||
#### Preventing (too much) duplication in the diagram
|
||||
|
||||
If you look at the definition of `split p`, you can see that the input `p` gets
|
||||
duplicated in the two output ports. If both of them are used in a given diagram, `p`
|
||||
will appear at least *twice* in this diagram. Now in the abstract setting of
|
||||
category theory, duplication in your diagram doesn't matter because of the
|
||||
equivalences and laws that equate diagrams.
|
||||
|
||||
However, we're trying to encode *useful* things. My original motivation for
|
||||
going to this deep end was modeling flow diagrams of *effectful computations*, for
|
||||
the new version of [achille]. You could imagine changing the `Embed` constructor
|
||||
of `Flow` to the following:
|
||||
|
||||
[achille]: /projects/achille
|
||||
|
||||
```haskell
|
||||
Embed :: (a -> IO b) -> Flow a b
|
||||
```
|
||||
|
||||
From then on, encoded flow diagrams are meant to be *executed* --- and even parallelised for
|
||||
free, how neat --- but I hope you see now why we should **never** duplicate any
|
||||
such `embed` box anymore. In the paper, that's why they argue it's
|
||||
important to restrict yourself to monoidal categories and linear functions,
|
||||
and make `copy` an explicit operation in your syntax.
|
||||
|
||||
```haskell
|
||||
copy :: Port r a %1 -> (Port r a, Port r a)
|
||||
```
|
||||
|
||||
Indeed, the type system will prevent you to use a variable more than once
|
||||
without explicitly inserting `copy`. However, everything has to be made linear
|
||||
and this has huge implications on the user-friendliness of the overall syntax.
|
||||
I've also found error messages from Linear Haskell to be pretty uninformative
|
||||
--- and `let` bindings are *not* linear, **what??**.
|
||||
|
||||
Instead, I came up with a solution I'm fairly happy with. There is a trick: we
|
||||
introduce a new primitive.
|
||||
|
||||
```haskell
|
||||
(>>=) :: Port r a -> (Port r a -> Port r b) -> Port r b
|
||||
```
|
||||
|
||||
Now of course, you could just implement it as `x >>= f = f x`, but that's
|
||||
precisely what we want to *avoid*. No, instead, I force the evaluation of `x`,
|
||||
and *then* evaluate `f` applied to a box that simply returns the pre-computed
|
||||
value, *side-effect free*. So a box gets duplicated, sure, but this box does
|
||||
*nothing*. What I am now noticing while writing this down, is that introducing
|
||||
this `Bind` operator makes the diagram non-inspectable again, oops. I can still
|
||||
run it of course, but this is a problem for future me.
|
||||
|
||||
Anyway, the benefit of this `(>>=)` operator is that it's very convenient with
|
||||
`QualifiedDo`:
|
||||
|
||||
```haskell
|
||||
diag :: Flow FilePath Bool
|
||||
diag = flow \src -> Flow.do
|
||||
Tup a b <- box1 src
|
||||
box3 a
|
||||
box4 (box2 b)
|
||||
```
|
||||
|
||||
You just have to stop using `let` and you're good to go. What's *remarkable*
|
||||
in the syntax is that these *effectful* boxes are used as plain old **functions**.
|
||||
Therefore there is *no need* to clutter the syntax with `<$>`, `>>=`, `pure`,
|
||||
you just compose and apply regular functions (that do side effects).
|
||||
|
||||
#### Compile vs. Evaluate
|
||||
|
||||
Now the paper is called "Evaluating [...]" and says there is a runtime cost to
|
||||
pay for the translation. Considering how simple *this* implementation is, and
|
||||
how the target is fully first-order (well, without `(>>=)`), I wouldn't be
|
||||
surprised if GHC is actually able to fully unfold and translate to category
|
||||
morphisms at compile-time. But I'm not an expert Haskell developer by any means
|
||||
and have no clue how one would go about checking this, so if anyone does, [please tell me][me]!
|
||||
|
||||
[me]: mailto:lucas@escot.me
|
||||
|
||||
#### Expressive power
|
||||
|
||||
I'm very curious to see which kind of categories you can "compile" Haskell
|
||||
functions to using this technique. I didn't expect converting to cartesian
|
||||
category's morphisms to be so straightforward, and I'm looking forward to see
|
||||
whether there is any fundamental limitation preventing us to do the same for
|
||||
CCCs.
|
||||
|
||||
---
|
||||
|
||||
Thank you for reading! I hope this was or will be useful to some of you. I am
|
||||
also very grateful for Bernardy and Spiwack's fascinating paper and library, it
|
||||
quite literally made my week.
|
||||
|
||||
Till next time, perhaps when I manage to release the next version of [achille],
|
||||
using this very trick that I've been searching for in the past 2 years.
|
Loading…
Reference in New Issue