# Calkin-Wilf for Early(-ish) Haskellers

09 Jul 2014*This post was inspired by
a Hacker News comment of mine.
I presented what follows somewhat poorly there, so I hope my
expansion below clarifies the idea.*

The
Calkin-Wilf enumeration of the rationals
is a beautiful algorithm and a lazy, functional implementation of it
is described in a
Functional Pearl by Gibbons, Lester, and Bird, *Enumerating the Rationals*.
You might not believe it on first glance, but the algorithm described
therein is actually more or less a Haskell one-liner which could be
tackled by a programmer new to Haskell entirely.

Gibbons, Lester, and Bird focus on correctness, beauty, and
*naturality* of the algorithm and the end result is inarguably a
*correct* Calkin-Wilf enumeration. Here’s a less obviously correct
algorithm which uses only a few Haskell concepts.

```
interleave [] ys = ys
interleave xs [] = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys
allRationals = go 1 1 where
go m n = (m/n) : interleave (go m (m+n)) (go (n+m) n)
```

For comparison, here’s an equivalent Python 2.7 implementation using generators (and here’s an implementation in Swift, Beta 4, although at the time of writing it seg faulted the Swift compiler)

```
from fractions import Fraction
from itertools import islice
def interleave(x, y):
while True:
yield x.next()
yield y.next()
def all_rationals():
def go(m, n):
yield (m/n)
for v in interleave(go(m, m+n), go(m+n, n)):
yield v
return go(Fraction(1,1), Fraction(1,1))
def rationals(n):
return list(islice(all_rationals(), n))
```

The remainder of this article walks through the problem being solved by the Calkin-Wilf enumeration, the solution proposed by Gibbons, Lester, and Bird, and a series of correctness-preserving steps transforming their solution to the one above along with an outline of their justification.

## Enumerating the Rationals

Among the mathy it’s a well-known fact that the rational
numbers—numbers generated by dividing two whole numbers—are
*enumerable* or *countable*. This means that there’s a process which
generates every rational number even if it would take an infinite
amount of time. Or rather, a process which meets the following
challenge:

Pick any rational number you like and this process will eventually name that number in a finite amount of time.

Let’s look at how you do that.

First, we’d like to make a picture of all of the rationals. We’ll just lay them out on a table

```
\ 1 2 3 4 5 6 7 ...
1 1/1 2/1 3/1 4/1 5/1 6/1 7/1
2 1/2 2/2 3/2 4/2 5/2 6/2 7/2
3 1/3 2/3 3/3 4/3 5/3 6/3 7/3
4 1/4 2/4 3/4 4/4 5/4 6/4 7/4
5 1/5 2/5 3/5 4/5 5/5 6/5 7/5
6 1/6 2/6 3/6 4/6 5/6 6/6 7/6
7 1/7 2/7 3/7 4/7 5/7 6/7 7/7
.
.
.
```

Specifically, we make an infinite grid and number the columns and rows. Then, each cell corresponds to a rational number: the column number divided by the row number!

Convinced we have a picture of the rationals, we can build our process by simply listing out the whole table. Let’s write out the order in which our process enumerates the rationals. An obvious process just starts at the top and reads left-to-right, bottom-to-top:

```
\ 1 2 3 4 5 6 7 ...
1 1 2 3 4 5 6 7 ...
2
3
4
5
6
7
.
.
.
```

I didn’t both numbering the second row because, well, we never actually get there! We’ll keep reading out the first row forever.

The fix is tricky but obvious in retrospect: we enumerate them in a zig-zag fashion from the upper-left corner

```
\ 1 2 3 4 5 6 7 ...
1 1 3 6 10 15
2 2 5 9 14
3 4 8 13
4 7 12 ...
5 11
6
7
.
.
.
```

If you look at this diagram and think “breadth-first traversal!” then you’re on the right track.

### The challenge

So the above enumeration proves that all of the rationals are indeed enumerable: it’s easy to see that any given rational is on our table somewhere and that our zig-zag enumeration will eventually get to it.

Unfortuantely, as an algorithm this table has a major deficiency: it repeats rationals over and over. In fact, for any given rational all possible equal representations of that rational appear on our table, all infinitely many!

So our zig-zag does enumerate all of the rationals, but it doesn’t do it terrifically efficiently. There is a lot of duplicate effort.

## Calkin-Wilf Enumeration

This is where the Calkin-Wilf enumeration comes in handy. Calkin-Wilf is a way to enumerate every rational number without any repeats.

The trick is that instead of laying out all of the rationals in a table and then performing breadth-first traversal we lay them out in a binary tree and perform breadth-first traversal.

The tree we’ll use is known as the Calkin-Wilf tree. Since we’re not worried about the math right now we’ll just state that this tree

- contains every rational number exactly once, and
- can be recursively generated

And so, the solution to enumeration of the rationals proposed by Gibbons, Lester, and Bird is merely to recursively generate the Calkin-Wilf tree and then perform a breadth-first traversal.

### A basic implementation

Here’s a somewhat simplified version of what they wrote.

First, we need a binary tree. Since Haskell types are recursive it’s easy to define the type of binary trees.

```
data BTree a = Leaf | Node a (BTree a) (BTree a)
```

In other words, a binary tree of `a`

values is either a leaf or a
node. Leafs contain nothing (are terminal) while nodes contain a value
of type `a`

and the left and right recursive subtrees.

Actually, though, this `BTree`

is more than what we need. There are
infinitely many rationals so the Calkin-Wilf tree is infinite as well.
We have no need for `Leaf`

s. It’s good practice for your types to
forbid anything which is not meaningful, so we’ll just remove them.

```
-- Ahh... simple infinite binary trees
data BTree a = Node a (BTree a) (BTree a)
```

Now, we build the tree recursively. We begin with a seed of `(1, 1)`

representing the rational `1/1`

which lives at the root of tree then
generate all subtrees recursively.

```
calkinWilf :: BTree Rational
calkinWilf = go (1, 1) where
-- recursive generator
go (m, n) = Node (m/n) (go (m+n, n)) (go (m, m+n))
```

The next step is to perform a breadth first traversal. This is a
function which converts a `BTree a`

into a list of its values, `[a]`

```
breadthFirst :: BTree a -> [a]
```

but we’ll want a little helper function first called `interleave`

.
Interleave takes two lists and deterministically shuffles them
together.

```
>>> interleave [1,2,3] [10,9,8,7,6,5,4,3,2,1]
[1,10,2,9,3,8,7,6,5,4,3,2,1]
```

However, since we’re working with infinite things we’ll want to be
careful to be sure `interleave`

operates properly on infinite
structures. It’s not worth describing it much more in words: the code,
with some examination, speaks for itself

```
interleave :: [a] -> [a] -> [a]
interleave [] ys = ys
interleave xs [] = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys
```

It’s probably worth spending a moment here to note how `interleave`

’s
definition take advantage of pattern matching to expose all of the
possible combinations of inputs. It’s usually easy to review a pattern
match and ensure there are no missing cases. It’s easy enough that
oftentimes the compiler will even warn you if you make a mistake!

```
interleave :: [a] -> [a] -> [a]
interleave [] ys = ys
interleave xs [] = xs
CalkinWilf.hs:9:1: Warning:
Pattern match(es) are non-exhaustive
In an equation for `interleave':
Patterns not matched: (_ : _) (_ : _)
```

But be aware that you can’t always rely on this. A warning is evidence your pattern matches are incomplete, but a lack of warnings is not evidence your program is safe. Anyway!

From here, breadth first traversal is pretty easy. We decompose a node
by viewing its value follows by the `interleave`

of the breadth-first
traversal of each subtree.

```
breadthFirst :: BTree a -> [a]
breadthFirst (Node a left right) =
a : interleave (breadthFirst left) (breadthFirst right)
```

And with these pieces we are done! We just compose them and reap our reward: an infinite list enumerating every rational number exactly once.

```
allRationals :: [Rational]
allRationals = breadthFirst calkinWilf
```

This is a beautiful algorithm and as long as you believe the
properties of the Calkin-Wilf tree it’s almost inconceivable that
`allRationals`

won’t fufill its promise.

But we can do a little better and, in the process, expose an opportunity to simplify the code.

## Aside #1: folds and unfolds

Lazy, functional programming often organizes itself around what are called “folds” and “unfolds” over different recursive data types. Folds and unfolds usually come in pairs, and, over lists, they are ubiquitous: many, many list processing functions can be written as folds or unfolds.

In order to identify a fold/unfold pair you have to identify what one “layer” of a type looks like. This is pretty easy: just eliminate all of the recursive positions in the type definition and replace them by new variables. For instance, we can define lists and list layers side-by-side

```
data List a = Nil | Cons a (List a)
data ListLayer a x = NilL | ConsL a x
```

then we can always find a fold and unfold function of the following types

```
fold :: (ListLayer a x -> x) -> List a -> x
unfold :: (x -> ListLayer a x) -> x -> List a
```

which, if we massage those types a bit, are exactly what the standard
`foldr`

and `unfoldr`

functions available in the Haskell Prelude and
`base`

library are

```
foldr :: (a -> x -> x) -> x -> [a] -> x
unfoldr :: (x -> Maybe (a, x)) -> x -> [a]
```

It’s even the case that the definitions can be more-or-less
mechanically derived, though I won’t show that here. The point is that
given a recursive data type we should assume we can write a
`fold`

/`unfold`

pair for that type which are (a) extremely general and
(b) reliant on almost no choice whatsoever from the author.

### Folding and unfolding binary trees

So, immediately, let’s apply this to binary trees.

```
data BTree a = Node a (BTree a) (BTree a)
data BTreeLayer a x = NodeL a x x
```

Or, if we look at `BTreeLayer`

sideways, we can see that it’s just a
triple

```
type BTreeLayer a x = (a, x, x)
```

We can write out the types of our tree fold and unfold and then play type tetris until we get the right answer

```
foldTree :: (BTreeLayer a x -> x) -> BTree a -> x
foldTree crush (Node a l r) = crush (a, foldTree crush l, foldTree crush r)
unfoldTree :: (x -> BTreeLayer a x) -> x -> BTree a
unfoldTree grow seed =
let (a, l_seed, r_seed) = grow seed
in Node a (unfoldTree grow l_seed) (unfoldTree grow r_seed)
```

## Revealing the structure of Calkin-Wilf

Now armed with our `foldTree`

/`unfoldTree`

pair we can re-examine the
Calkin-Wilf enumeration from before. We produced two functions:

```
breadthFirst :: BTree a -> [a]
calkinWilf :: BTree Rational
```

With a little type algebra it might appear that `breadthFirst`

could
be written using a fold

```
breadthFirst :: BTree a -> [a]
breadthFirst = foldTree crush where
crush :: BTreeLayer a [a] -> [a]
crush = _
```

It’s less clear, however, that `calkinWilf`

is just an unfold. In
particular, we’ve already assumed the starting seed. If we rewrite it
(*very*) slightly to be generic in choice of seed

```
calkinWilf :: (Int, Int) -> BTree Rational
calkinWilf = go where
go (m, n) = Node (m/n) (go (m+n, n)) (go (m, m+n))
```

then it becomes more clear how to reveal the `unfoldTree`

inside of it

```
calkinWilf :: (Int, Int) -> BTree Rational
calkinWilf = unfoldTree grow where
grow :: (Int, Int) -> BTreeLayer Rational (Int, Int)
grow = _
```

Finally, examining the definitions we wrote originally we can probably
isolate and extract the what `grow`

and `crush`

need to do—they just
provide the “interesting” part of each recursive step.

```
breadthFirst = foldTree crush where
crush (a, left, right) = a : interleave left right
calkinWilf = unfoldTree grow where
grow (m, n) = (m/n, (m+n, n), (m, m+n))
```

So, all together now, we have

```
breadthFirst = foldTree crush where
crush (a, left, right) = a : interleave left right
calkinWilf = unfoldTree grow where
grow (m, n) = (m/n, (m+n, n), (m, m+n))
allRationals :: [Rational]
allRationals = breadthFirst (calkinWilf (1,1))
```

Fundamentally, this is the same algorithm and even the same code as before. It’s merely been slightly refactored to expose the fold/unfold pair which drives the whole algorithm. Our next advance will be to take advantage of that structure!

## Aside 2: Mapping and fusion

All data type “layers” share an interesting property. Given any function with the apropriate type we can easily “push” that function down to apply it at the “recursive holes” in our layer.

An example is more clear: if we have a `ListLayer a x`

and a function
`f :: x -> y`

then we can generate a new `ListLayer a y`

by replacing
our `x`

es with `y`

s. If we embody this in a function, and we ought to
since this is *functional* programming after all, then the resulting
function is often called a *map*.

```
mapListLayer :: (x -> y) -> ListLayer a x -> ListLayer a y
mapListLayer f NilL = NilF -- no recursive holes
mapListLayer f (ConsL a x) = ConsL a (f x) -- "push" the function
```

Since *all* layer types have mapping functions, we ought to be able to
define one for `BTreeLayer`

as well and we can. The only novelty is
that we must apply the “mapped” function `f`

, twice.

```
mapBTreeLayer :: (x -> y) -> BTreeLayer a x -> BTreeLayer a y
mapListLayer f (a, x1, x2) = (a, f x1, f x2)
```

Where does mapping come in handy? Well, the unhelpful answer is
*practically everywhere*, but we’ll use it powerfully immediately.

### Fold-unfold fusion

If unfolding is all about creating a recursive data type by repeatedly growing new layers from a seed; and if folding is all about destroying a recursive data type by crushing layer by layer to a summary value; then what happens if we fold something we just immediately unfolded.

Well, we end up generating the intermediate data type even though we could have gotten away without it—we just unfold a seed into a layer and then immediately crush the layer down to our summary result.

This notion that “`unfold`

-then-`fold`

” does unnecessary work and can
be replaced is called fold-unfold fusion. It works like this

```
foldX crush (unfoldX grow seed) == fusedX crush grow
-- where...
fusedX :: (XLayer r -> r) -> (s -> XLayer s) -> s -> r
fusedX crush grow seed =
crush (mapX (fusedX crush grow) (grow seed))
```

Usually by exposing the fold-unfold structure and then eliminating the intermediate types by using fusion we can expose opportunities to simplify code. We’ll take advantage of that for enumerating Calkin-Wilf immediately.

## Fusing Calkin-Wilf

If we remember the folds and unfolds Calkin-Wilf algorithm as we had it just before there may not immediately be an obvious chance to apply fusion

```
breadthFirst = foldTree crush where
crush (a, left, right) = a : interleave left right
calkinWilf = unfoldTree grow where
grow (m, n) = (m/n, (m+n, n), (m, m+n))
```

However, when computing the rationals themselves we *compose*
`calkinWilf`

with `breadthFirst`

```
allRationals = breadthFirst (calkinWilf (1,1))
```

and if we inline those definitions our fusion opportunity arises

```
allRationals =
foldTree crush (unfoldTree grow (1, 1))
where
crush (a, left, right) = a : interleave left right
grow (m, n) = (m/n, (m+n, n), (m, m+n))
-- becomes
allRationals = go (1, 1) where
go seed = crush (mapBTreeLayer go (grow seed))
crush (a, left, right) = a : interleave left right
grow (m, n) = (m/n, (m+n, n), (m, m+n))
```

And then if we inline the definition of `grow`

```
allRationals = go (1, 1) where
go (m, n) = crush (mapBTreeLayer go (m/n, (m+n, n), (m, m+n)))
crush (a, left, right) = a : interleave left right
```

… the definition of `mapBTreeLayer`

…

```
allRationals = go (1, 1) where
go (m, n) = crush (m/n, go (m+n, n), go (m, m+n))
crush (a, left, right) = a : interleave left right
```

… and then the definition of `crush`

…

```
allRationals = go (1, 1) where
go (m, n) = a : interleave (go (m+n, n)) (go (m, m+n))
```

we end up with the algorithm stated at the very beginning of this whole article modulo a little stylistic detail about unnecessary tupling.

Which is kind of remarkable if you think about it.

Beginning with an obviously correct algorithm which cleanly described “generate a recursive, infinite tree then perform a breadth-first traversal of it” we applied completely mechanical transformations to convert it into a one-liner with no tree in sight.

The secret is threefold:

- Recognition of common patterns of recursion in our code,
- Application of
*equivalence*between different code patterns, and - Aggressive inlining

Tellingly, none of these secret components are *hard*. They’re all
totally mechanical given a listing of what I just called “pattern
equivalences”. The ability to generate these and to so aggressively
inline is a strength of Haskell as a language and is carried by its
properties of *purity*, *laziness*, and *referential transparency*.

## Aside 3: Fusion at play

You may have noticed that while these discussions of fusion were
tailored toward the task at hand, toward simplifying a binary tree
algorithm, I was careful to work them out both for lists and for *any
recursive data type whatsoever*. It turns out that opportunities for
fusion like this are at nearly every corner in a functional program.

Furthermore, since the mechanisms which allow us to exploit
fusion—namely *recognition*, *transformation under equivalence*, and
*inlining*—are so… well, *mechanical* we might expect these
properties to be abused automatically.

In fact, GHC Haskell does do aggressive list fusion under the name
“short-cut fusion” using the unfold-fold rule amongst others. More
than that, recognition of fusion opportunities is not limited to
compiler hacks in GHC: it’s exposed to library writers! To this end,
highly optimized libraries like `vector`

and `pipes`

take advantage of
built-in *rewrite rules* to fire and open up fusion and optimization
opportunities for GHC to exploit.

It’s truly the case that GHC is able to take advantage of much of the malleability and formal amenability of Haskell-the-language to discover chances to optimize it. It can even do so on code it has never seen before provided a few hints by the author as to what transformations are allowable.

## Conclusion

So here’s a “one-liner” Calkin-Wilf enumeration (plus an associated “library function”)

```
interleave [] ys = ys
interleave xs [] = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys
allRationals = go 1 1 where
go m n = (m/n) : interleave (go m (m+n)) (go (n+m) n)
```

To a beginner’s eye it might be magical because of how the recursive
calls take advantage of laziness to ensure the call doesn’t spiral in
an infinite loop. It’s mechanism might be a little uncertain as well,
although a vigilant eye might catch scent of the binary tree
underlying the arguments to `interleave`

.

It’s, however, highly unlikely that a beginner will unpack this
one-liner to the “generate-consume” pattern explored by Gibbons,
Lester, and Bird when they first outlined the algorithm! Most of that
structure has vanished in a puff of smoke under the force of a few
*correctness-preserving code transformations*.

The ease at which one can eventually become at highlighting and exploting these opportunities when programming in a language like Haskell is remarkable and suggests a style of programming where you first endeavor to make a solution to a problem “obviously correct” and then merely transform the solution without violating correctness to one that is faster, pithier, or just more digestible.

For many, many, many more examples of this kind of programming, I
highly recommend
Dr. Richard Bird’s book, *Pearls of Functional Algorithm Design*.

## Comments

comments powered by Disqus