Calkin-Wilf for Early(-ish) Haskellers

This post was inspired by a Hacker News comment of mine. I presented what follows somewhat poorly there, so I hope my expansion below clarifies the idea.

The Calkin-Wilf enumeration of the rationals is a beautiful algorithm and a lazy, functional implementation of it is described in a Functional Pearl by Gibbons, Lester, and Bird, Enumerating the Rationals. You might not believe it on first glance, but the algorithm described therein is actually more or less a Haskell one-liner which could be tackled by a programmer new to Haskell entirely.

Gibbons, Lester, and Bird focus on correctness, beauty, and naturality of the algorithm and the end result is inarguably a correct Calkin-Wilf enumeration. Here’s a less obviously correct algorithm which uses only a few Haskell concepts.

interleave []     ys     = ys
interleave xs     []     = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys

allRationals = go 1 1 where
  go m n = (m/n) : interleave (go m (m+n)) (go (n+m) n)

For comparison, here’s an equivalent Python 2.7 implementation using generators

from fractions import Fraction
from itertools import islice

def interleave(x, y):
  while True:

def all_rationals():
  def go(m, n):
    yield (m/n)
    for v in interleave(go(m, m+n), go(m+n, n)):
      yield v
  return go(Fraction(1,1), Fraction(1,1))

def rationals(n):
  return list(islice(all_rationals(), n))

The remainder of this article walks through the problem being solved by the Calkin-Wilf enumeration, the solution proposed by Gibbons, Lester, and Bird, and a series of correctness-preserving steps transforming their solution to the one above along with an outline of their justification.

Enumerating the Rationals

Among the mathy it’s a well-known fact that the rational numbers—numbers generated by dividing two whole numbers—are enumerable or countable. This means that there’s a process which generates every rational number even if it would take an infinite amount of time. Or rather, a process which meets the following challenge:

Pick any rational number you like and this process will eventually name that number in a finite amount of time.

Let’s look at how you do that.

First, we’d like to make a picture of all of the rationals. We’ll just lay them out on a table

\ 1    2    3    4    5    6    7    ...
1 1/1  2/1  3/1  4/1  5/1  6/1  7/1
2 1/2  2/2  3/2  4/2  5/2  6/2  7/2
3 1/3  2/3  3/3  4/3  5/3  6/3  7/3
4 1/4  2/4  3/4  4/4  5/4  6/4  7/4
5 1/5  2/5  3/5  4/5  5/5  6/5  7/5
6 1/6  2/6  3/6  4/6  5/6  6/6  7/6
7 1/7  2/7  3/7  4/7  5/7  6/7  7/7

Specifically, we make an infinite grid and number the columns and rows. Then, each cell corresponds to a rational number: the column number divided by the row number!

Convinced we have a picture of the rationals, we can build our process by simply listing out the whole table. Let’s write out the order in which our process enumerates the rationals. An obvious process just starts at the top and reads left-to-right, bottom-to-top:

\ 1    2    3    4    5    6    7    ...
1 1    2    3    4    5    6    7    ...

I didn’t both numbering the second row because, well, we never actually get there! We’ll keep reading out the first row forever.

The fix is tricky but obvious in retrospect: we enumerate them in a zig-zag fashion from the upper-left corner

\ 1    2    3    4    5    6    7    ...
1 1    3    6    10   15
2 2    5    9    14
3 4    8    13
4 7    12         ...
5 11

If you look at this diagram and think “breadth-first traversal!” then you’re on the right track.

The challenge

So the above enumeration proves that all of the rationals are indeed enumerable: it’s easy to see that any given rational is on our table somewhere and that our zig-zag enumeration will eventually get to it.

Unfortuantely, as an algorithm this table has a major deficiency: it repeats rationals over and over. In fact, for any given rational all possible equal representations of that rational appear on our table, all infinitely many!

So our zig-zag does enumerate all of the rationals, but it doesn’t do it terrifically efficiently. There is a lot of duplicate effort.

Calkin-Wilf Enumeration

This is where the Calkin-Wilf enumeration comes in handy. Calkin-Wilf is a way to enumerate every rational number without any repeats.

The trick is that instead of laying out all of the rationals in a table and then performing breadth-first traversal we lay them out in a binary tree and perform breadth-first traversal.

The tree we’ll use is known as the Calkin-Wilf tree. Since we’re not worried about the math right now we’ll just state that this tree

  1. contains every rational number exactly once, and
  2. can be recursively generated

And so, the solution to enumeration of the rationals proposed by Gibbons, Lester, and Bird is merely to recursively generate the Calkin-Wilf tree and then perform a breadth-first traversal.

A basic implementation

Here’s a somewhat simplified version of what they wrote.

First, we need a binary tree. Since Haskell types are recursive it’s easy to define the type of binary trees.

data BTree a = Leaf | Node a (BTree a) (BTree a)

In other words, a binary tree of a values is either a leaf or a node. Leafs contain nothing (are terminal) while nodes contain a value of type a and the left and right recursive subtrees.

Actually, though, this BTree is more than what we need. There are infinitely many rationals so the Calkin-Wilf tree is infinite as well. We have no need for Leafs. It’s good practice for your types to forbid anything which is not meaningful, so we’ll just remove them.

-- Ahh... simple infinite binary trees
data BTree a = Node a (BTree a) (BTree a)

Now, we build the tree recursively. We begin with a seed of (1, 1) representing the rational 1/1 which lives at the root of tree then generate all subtrees recursively.

calkinWilf :: BTree Rational
calkinWilf = go (1, 1) where
  -- recursive generator
  go (m, n) = Node (m/n) (go (m+n, n)) (go (m, m+n))

The next step is to perform a breadth first traversal. This is a function which converts a BTree a into a list of its values, [a]

breadthFirst :: BTree a -> [a]

but we’ll want a little helper function first called interleave. Interleave takes two lists and deterministically shuffles them together.

>>> interleave [1,2,3] [10,9,8,7,6,5,4,3,2,1]

However, since we’re working with infinite things we’ll want to be careful to be sure interleave operates properly on infinite structures. It’s not worth describing it much more in words: the code, with some examination, speaks for itself

interleave :: [a] -> [a] -> [a]
interleave []     ys     = ys
interleave xs     []     = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys

It’s probably worth spending a moment here to note how interleave’s definition take advantage of pattern matching to expose all of the possible combinations of inputs. It’s usually easy to review a pattern match and ensure there are no missing cases. It’s easy enough that oftentimes the compiler will even warn you if you make a mistake!

interleave :: [a] -> [a] -> [a]
interleave []     ys     = ys
interleave xs     []     = xs

CalkinWilf.hs:9:1: Warning:
    Pattern match(es) are non-exhaustive
    In an equation for `interleave':
        Patterns not matched: (_ : _) (_ : _)

But be aware that you can’t always rely on this. A warning is evidence your pattern matches are incomplete, but a lack of warnings is not evidence your program is safe. Anyway!

From here, breadth first traversal is pretty easy. We decompose a node by viewing its value follows by the interleave of the breadth-first traversal of each subtree.

breadthFirst :: BTree a -> [a]
breadthFirst (Node a left right) =
  a : interleave (breadthFirst left) (breadthFirst right)

And with these pieces we are done! We just compose them and reap our reward: an infinite list enumerating every rational number exactly once.

allRationals :: [Rational]
allRationals = breadthFirst calkinWilf

This is a beautiful algorithm and as long as you believe the properties of the Calkin-Wilf tree it’s almost inconceivable that allRationals won’t fufill its promise.

But we can do a little better and, in the process, expose an opportunity to simplify the code.

Aside #1: folds and unfolds

Lazy, functional programming often organizes itself around what are called “folds” and “unfolds” over different recursive data types. Folds and unfolds usually come in pairs, and, over lists, they are ubiquitous: many, many list processing functions can be written as folds or unfolds.

In order to identify a fold/unfold pair you have to identify what one “layer” of a type looks like. This is pretty easy: just eliminate all of the recursive positions in the type definition and replace them by new variables. For instance, we can define lists and list layers side-by-side

data List      a   = Nil  | Cons  a (List a)
data ListLayer a x = NilL | ConsL a x

then we can always find a fold and unfold function of the following types

fold   :: (ListLayer a x -> x) -> List a -> x
unfold :: (x -> ListLayer a x) -> x -> List a

which, if we massage those types a bit, are exactly what the standard foldr and unfoldr functions available in the Haskell Prelude and base library are

foldr   :: (a -> x -> x) -> x -> [a] -> x
unfoldr :: (x -> Maybe (a, x)) -> x -> [a]

It’s even the case that the definitions can be more-or-less mechanically derived, though I won’t show that here. The point is that given a recursive data type we should assume we can write a fold/unfold pair for that type which are (a) extremely general and (b) reliant on almost no choice whatsoever from the author.

Folding and unfolding binary trees

So, immediately, let’s apply this to binary trees.

data BTree      a   = Node  a (BTree a) (BTree a)
data BTreeLayer a x = NodeL a x         x

Or, if we look at BTreeLayer sideways, we can see that it’s just a triple

type BTreeLayer a x = (a, x, x)

We can write out the types of our tree fold and unfold and then play type tetris until we get the right answer

foldTree :: (BTreeLayer a x -> x) -> BTree a -> x
foldTree crush (Node a l r) = crush (a, foldTree crush l, foldTree crush r)

unfoldTree :: (x -> BTreeLayer a x) -> x -> BTree a
unfoldTree grow seed =
  let (a, l_seed, r_seed) = grow seed
  in Node a (unfoldTree grow l_seed) (unfoldTree grow r_seed)

Revealing the structure of Calkin-Wilf

Now armed with our foldTree/unfoldTree pair we can re-examine the Calkin-Wilf enumeration from before. We produced two functions:

breadthFirst :: BTree a -> [a]
calkinWilf   :: BTree Rational

With a little type algebra it might appear that breadthFirst could be written using a fold

breadthFirst :: BTree a -> [a]
breadthFirst = foldTree crush where
  crush :: BTreeLayer a [a] -> [a]
  crush = _

It’s less clear, however, that calkinWilf is just an unfold. In particular, we’ve already assumed the starting seed. If we rewrite it (very) slightly to be generic in choice of seed

calkinWilf :: (Int, Int) -> BTree Rational
calkinWilf = go where
  go (m, n) = Node (m/n) (go (m+n, n)) (go (m, m+n))

then it becomes more clear how to reveal the unfoldTree inside of it

calkinWilf :: (Int, Int) -> BTree Rational
calkinWilf = unfoldTree grow where
  grow :: (Int, Int) -> BTreeLayer Rational (Int, Int)
  grow = _

Finally, examining the definitions we wrote originally we can probably isolate and extract the what grow and crush need to do—they just provide the “interesting” part of each recursive step.

breadthFirst = foldTree crush where
  crush (a, left, right) = a : interleave left right

calkinWilf = unfoldTree grow where
  grow (m, n) = (m/n, (m+n, n), (m, m+n))

So, all together now, we have

breadthFirst = foldTree crush where
  crush (a, left, right) = a : interleave left right

calkinWilf = unfoldTree grow where
  grow (m, n) = (m/n, (m+n, n), (m, m+n))

allRationals :: [Rational]
allRationals = breadthFirst (calkinWilf (1,1))

Fundamentally, this is the same algorithm and even the same code as before. It’s merely been slightly refactored to expose the fold/unfold pair which drives the whole algorithm. Our next advance will be to take advantage of that structure!

Aside 2: Mapping and fusion

All data type “layers” share an interesting property. Given any function with the apropriate type we can easily “push” that function down to apply it at the “recursive holes” in our layer.

An example is more clear: if we have a ListLayer a x and a function f :: x -> y then we can generate a new ListLayer a y by replacing our xes with ys. If we embody this in a function, and we ought to since this is functional programming after all, then the resulting function is often called a map.

mapListLayer :: (x -> y) -> ListLayer a x -> ListLayer a y
mapListLayer f NilL        = NilF           -- no recursive holes
mapListLayer f (ConsL a x) = ConsL a (f x)  -- "push" the function

Since all layer types have mapping functions, we ought to be able to define one for BTreeLayer as well and we can. The only novelty is that we must apply the “mapped” function f, twice.

mapBTreeLayer :: (x -> y) -> BTreeLayer a x -> BTreeLayer a y
mapListLayer f (a, x1, x2) = (a, f x1, f x2)

Where does mapping come in handy? Well, the unhelpful answer is practically everywhere, but we’ll use it powerfully immediately.

Fold-unfold fusion

If unfolding is all about creating a recursive data type by repeatedly growing new layers from a seed; and if folding is all about destroying a recursive data type by crushing layer by layer to a summary value; then what happens if we fold something we just immediately unfolded.

Well, we end up generating the intermediate data type even though we could have gotten away without it—we just unfold a seed into a layer and then immediately crush the layer down to our summary result.

This notion that “unfold-then-fold” does unnecessary work and can be replaced is called fold-unfold fusion. It works like this

foldX crush (unfoldX grow seed) == fusedX crush grow

-- where...

fusedX :: (XLayer r -> r) -> (s -> XLayer s) -> s -> r
fusedX crush grow seed =
  crush (mapX (fusedX crush grow) (grow seed))

Usually by exposing the fold-unfold structure and then eliminating the intermediate types by using fusion we can expose opportunities to simplify code. We’ll take advantage of that for enumerating Calkin-Wilf immediately.

Fusing Calkin-Wilf

If we remember the folds and unfolds Calkin-Wilf algorithm as we had it just before there may not immediately be an obvious chance to apply fusion

breadthFirst = foldTree crush where
  crush (a, left, right) = a : interleave left right

calkinWilf = unfoldTree grow where
  grow (m, n) = (m/n, (m+n, n), (m, m+n))

However, when computing the rationals themselves we compose calkinWilf with breadthFirst

allRationals = breadthFirst (calkinWilf (1,1))

and if we inline those definitions our fusion opportunity arises

allRationals =
  foldTree crush (unfoldTree grow (1, 1))

    crush (a, left, right) = a : interleave left right
    grow (m, n) = (m/n, (m+n, n), (m, m+n))

-- becomes

allRationals = go (1, 1) where

  go seed = crush (mapBTreeLayer go (grow seed))
  crush (a, left, right) = a : interleave left right
  grow (m, n) = (m/n, (m+n, n), (m, m+n))

And then if we inline the definition of grow

allRationals = go (1, 1) where

  go (m, n) = crush (mapBTreeLayer go (m/n, (m+n, n), (m, m+n)))
  crush (a, left, right) = a : interleave left right

… the definition of mapBTreeLayer

allRationals = go (1, 1) where

  go (m, n) = crush (m/n, go (m+n, n), go (m, m+n))
  crush (a, left, right) = a : interleave left right

… and then the definition of crush

allRationals = go (1, 1) where
  go (m, n) = a : interleave (go (m+n, n)) (go (m, m+n))

we end up with the algorithm stated at the very beginning of this whole article modulo a little stylistic detail about unnecessary tupling.

Which is kind of remarkable if you think about it.

Beginning with an obviously correct algorithm which cleanly described “generate a recursive, infinite tree then perform a breadth-first traversal of it” we applied completely mechanical transformations to convert it into a one-liner with no tree in sight.

The secret is threefold:

  1. Recognition of common patterns of recursion in our code,
  2. Application of equivalence between different code patterns, and
  3. Aggressive inlining

Tellingly, none of these secret components are hard. They’re all totally mechanical given a listing of what I just called “pattern equivalences”. The ability to generate these and to so aggressively inline is a strength of Haskell as a language and is carried by its properties of purity, laziness, and referential transparency.

Aside 3: Fusion at play

You may have noticed that while these discussions of fusion were tailored toward the task at hand, toward simplifying a binary tree algorithm, I was careful to work them out both for lists and for any recursive data type whatsoever. It turns out that opportunities for fusion like this are at nearly every corner in a functional program.

Furthermore, since the mechanisms which allow us to exploit fusion—namely recognition, transformation under equivalence, and inlining—are so… well, mechanical we might expect these properties to be abused automatically.

In fact, GHC Haskell does do aggressive list fusion under the name “short-cut fusion” using the unfold-fold rule amongst others. More than that, recognition of fusion opportunities is not limited to compiler hacks in GHC: it’s exposed to library writers! To this end, highly optimized libraries like vector and pipes take advantage of built-in rewrite rules to fire and open up fusion and optimization opportunities for GHC to exploit.

It’s truly the case that GHC is able to take advantage of much of the malleability and formal amenability of Haskell-the-language to discover chances to optimize it. It can even do so on code it has never seen before provided a few hints by the author as to what transformations are allowable.


So here’s a “one-liner” Calkin-Wilf enumeration (plus an associated “library function”)

interleave []     ys     = ys
interleave xs     []     = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys

allRationals = go 1 1 where
  go m n = (m/n) : interleave (go m (m+n)) (go (n+m) n)

To a beginner’s eye it might be magical because of how the recursive calls take advantage of laziness to ensure the call doesn’t spiral in an infinite loop. It’s mechanism might be a little uncertain as well, although a vigilant eye might catch scent of the binary tree underlying the arguments to interleave.

It’s, however, highly unlikely that a beginner will unpack this one-liner to the “generate-consume” pattern explored by Gibbons, Lester, and Bird when they first outlined the algorithm! Most of that structure has vanished in a puff of smoke under the force of a few correctness-preserving code transformations.

The ease at which one can eventually become at highlighting and exploting these opportunities when programming in a language like Haskell is remarkable and suggests a style of programming where you first endeavor to make a solution to a problem “obviously correct” and then merely transform the solution without violating correctness to one that is faster, pithier, or just more digestible.

For many, many, many more examples of this kind of programming, I highly recommend Dr. Richard Bird’s book, Pearls of Functional Algorithm Design.


comments powered by Disqus