Compiling Pattern Matching

Introduction

This post intends to provide a brief overview of the algorithm described in Luc Maranget’s “Compiling Pattern Matching to Good Decision Trees”.

I’m fond of this formalisation as it has a straightforward implementation and results in reasonable decision trees for the majority of practical instances of pattern matching encountered in functional programming. Additionally, the resulting decision tree is easy to reason about and can be used as the basis of related transformations - such as translating into matching over simple patterns (patterns without nesting), or directly into an intermediary representation with join points (a potential way to express branching control flow in normal-form intermediate representations).

The Matrix

The central data structure in the algorithm is that of a pattern matrix and associated occurrence and action vectors.

Pattern matrix - stores examinable patterns appearing on the left-hand side of pattern matching rules. The “examinable” refers to the fact that each cell stores a pattern that can be scrutinised against a matching criteria (the refutable component of patterns) - as we’ll see in this scheme, tuples do not directly appear in cells of the pattern matrix, but their components do (as it is assumed that the arity of a tuple cannot be refuted - otherwise prior typechecking would fail in ML). The core compilation algorithm works by performing transformations on the pattern matrix, decomposing patterns that admit certain constructors (and values).
Occurrence vector - describes sequence of extractions required to fetch the associated value from a column in the pattern matrix. Occurrences must be maintained during decomposition of the pattern matrix as they effectively describe the access paths to the scrutinees of the n-way branches of the interior nodes in the resulting decision tree.
Action vector - stores corresponding actions for each row in the pattern matrix. These are the expressions that appear on the right-hand side of pattern matching rules (or some other identifying value, as they may appear in multiple leaves of the decision tree). Upon successful matching of a pattern, the associated action is executed (actions therefore appear as leaves in the decision tree or, as we’ll come to see, sink nodes in the associated DAG representation).

Matrix Operations

The following sections describe the transformations performed during the algorithm.

Construction

The construction of the pattern matrix and related action and occurrence vectors requires some care. The primary difficulty comes from the inclusion of tuples.

In the case of tuples, they are never directly stored in cells of the pattern matrix - their components are. This is exemplified by the first example in Maranget’s paper concerning the implementation of a canonical merge function in an ML-like language:

fun merge xs ys = 
  case (xs, ys) of
     (Nil, _) => (* 1 *)
   | (_, Nil) => (* 2 *)
   | (Cons (x, xs), Cons (y, ys)) => (* 3 *)

For clarity, I’ve opted to name list constructors explicitly (as Nil and Cons).

The initial pattern matrix and associated vectors will look like this:

The v appearing in the occurrence vector denotes the scrutinee of the case ((xs, ys)). The v.i notation denotes a projection. It’s written this way to make the presentation shorter. Furthermore, it is implicitly assumed that the base of occurrences are not actually the original expressions themselves, but a variable that refers to them. This is because, when generating code for each match case, you can bring the relevant pattern variables into scope by introducing them with let bindings (that bind the result of translating occurrences into the relevant unwrappings and projections). To avoid potentially duplicating an effectful operation when introducing these variables, it’s assumed a kind of normalisation is performed to give a name to the expression being matched upon. This kind of legalisation can be performed before match compilation or potentially during a combined normalisation algorithm, where normalising the scrutinee of a case will yield an atomic value.

As you can see in the above illustration, the pattern matrix does not retain the structure of the tuples appearing in each rule of the match. When translating a tuple pattern into a row for the pattern matrix, you introduce a column for each component. Each component is additionally labelled with an occurrence that denotes the projection from the relevant index of the tuple.

Another source of complexity comes from the fact that patterns in a case need not have the same shape. This scenario is typified by catch-all pattern variables that may follow tuple patterns. Consider the following snippet of SML code:

case f () of 
   (SOME x, _) => (* 1 *)
 | t => (* 2 *)

The problem with the above is that t will disappear when constructing the pattern matrix because columns for components of the prior tuple are introduced instead. A solution for this scenario may be a prior legalisation that moves top-level pattern variables to the right-hand side of match rules:

let val v = f () in
  case v of 
     (SOME x, _) => (* 1 *)
   | _ => let val t = v in (* 2 *) end
end

Doing this will ensure that t will be in scope for the code in (* 2 *).

Another way of ensuring the shape of patterns don’t introduce annoying edge cases or require writing the program is to bookkeep each pattern in terms of its occurrences. For example, in my own simple implementation of the match compilation algorithm (here), construction of the pattern matrices is done by collecting dictionaries (one for each pattern), mapping occurrences to their related pattern components. Then, a matrix is constructed by filling in a table whose columns are the occurrences traversed during the dictionary creation process (in the left-to-right order they were visited). From this perspective, much of the match compilation algorithm can be seen as building and decomposing tables (matrices).

Specialisation

In order to transform the matrix and vectors into a state that represents the successful matching of a value or constructor upon a collection of rows, specialisation is performed. Specialisation is an operation that retains only the rows of the matrix that admit a specified value or constructor, simultaneously decomposing value-carrying sub-patterns into their corresponding entries in the pattern matrix and occurrence vector (with requirements similar to that of the initial construction of the pattern matrix).

Consider the following code snippet and associated initial pattern matrix and vectors:

case v of
   Some (1, x) => (* 1 *)
 | Some (y, 2) => (* 2 *)
 | None => (* 3 *)

Specialisation of the Some constructor yields the following:

Here, [v], denotes accessing the value held by the constructor. The appeal of the matrix formalism for simultaneous matching is clear when you consider that the resultant matrix effectively models the inner case of the following snippet:

case v of
   Some z =>
     case z of 
        (1, x) => (* 1 *)
      | (y, 2) => (* 2 *)
     end
 | None => (* 3 *)
end

where z is a name given to the unwrapping occurrence. It is not hard to see that the resultant decision tree can be used to rewrite pattern matching into matching over simple patterns.

Similarly, specialising the original matrix from above upon None yields the following:

Note that, for consistency, nullary constructors such as None must be treated as though they carry unit, (), which - when introduced to the pattern matrix - will construct a row with a single wildcard pattern. The reason for this is for consistency and to avoid a case of the algorithm (described in a later section) that would treat an empty matrix as indicative of matching failure.

Defaulting

Producing the default matrix of a given pattern matrix retains only the rows that match constructors or values not already present in the column being scrutinised. In other words, if the column being scrutinised contains an irrefutable (either a pattern variable or wildcard) entry, then its decomposition is included in the default matrix. Or, in the case where matching is inexhaustive, defaulting creates an empty pattern matrix (since there are no rows to admit any constructor or value). Intuitively, if each interior node of the decision tree switches on a column’s value (using the occurence for that column), then defaulting is used to create the default case (to account for constructors or values not present).

Note that specialisation and defaulting may produce matrices with rows in common. Additionally, it may be important that defaulting retains variables and their occurrences (in order to introduce pattern variables into scope upon a sucessful match). This concern is mitigated in Maranget’s explanation of the algorithm as their paper dispenses of pattern variables, considering only wildcards (which can be thought of as locally unique pattern variables with no uses).

Swapping

To aid with implementation of the algorithm, an auxiliary operation that swaps columns of the pattern matrix (and, consequentially, the occurrence vector) must be implemented. As you will see in the following section that describes the algorithm, all prior references to the “column being scrutinised” actually refer only to the first column, as matching proceeds left-to-right; ensuring that there is always a refutable pattern in the first column by potentially swapping it for another.

The Algorithm

The algorithm is well described in the paper, but I will recite it here anyway. It’s defined as recursive function that performs case analysis over the provided matrix. The cases are as follows:

If the pattern matrix is empty, that implies it was produced by a defaulting operation that could find no row to include, so it is indicative of matching failure. A leaf node indicating failure is returned.
If the first row of the pattern matrix is irrefutable (i.e. only pattern variables or wildcards), then matching has succeeded and we return a leaf node containing a reference to the action vector entry the for the first row.

If there are other rows in the pattern matrix that are also irrefutable then that implies the pattern corresponding to the first row subsumes them. In other words, those extra rows are redundant as they can never be reached.
If neither of the previous cases hold then we look for a column that contains at least one refutable entry. If the found column is not the first one, we make it the first by swapping their positions.

Now that the first column is refutable, we collect a form of signature by analysing each entry. If the column’s type is a variant, then we collect all the names of constructors from entries that contain variant constructor patterns (for example, we would collect Some if we encounter Some (1, x)). A similar collection is done for other kinds of refutable patterns, such as integer constants.

Once the signatures are collected, a specialised matrix is produced for each one. Those matrices are then recursively compiled by the algorithm and the resultant decision trees are recorded as targets of an n-way switching node in the decision tree.

If the signature is not complete, i.e. does not contain all potential constructors for a given type, then the default matrix is computed for the original matrix - the resultant decision tree is then included among the alternatives of the returned switch node as the default case.

Example

Below is a complete annotated example of the conceptual process of computing the decision tree for the following SML snippet:

case v of
   Some (1, x) => (* 1 *)
 | Some (y, 2) => (* 2 *)
 | None => (* 3 *)

Further examples

Below are a few example SML snippets and their corresponding decision trees, produced by my implementation of the algorithm described above:

First, the example given in the paper (for merge or a similar function, such as zip):

case e of
    (Nil, _) => (* 1 *)
  | (_, Nil) => (* 2 *)
  | (Cons (x, xs), Cons (y, ys)) =>

Secondly, a more compact presentation of the running example used in this article:

case e of
   Some (1, x) => (* 1 *)
 | Some (y, 2) => (* 2 *)
 | None => (* 3 *)

Finally, Okasaki’s red-black tree balance function (from “Purely Functional Data Structures”):

case e of
   (B,T (R,T (R,a,x,b),y,c),z,d) => (* 1 *)
 | (B,T (R,a,x,T (R,b,y,c)),z,d) => (* 2 *)
 | (B,a,x,T (R,T (R,b,y,c),z,d)) => (* 3 *)
 | (B,a,x,T (R,b,y,T (R,c,z,d))) => (* 4 *)
 | body => (* 5 *)

I include this example because it illustrates an important point - those aware of the implementation of balance will know that every case, except the last, has the same body. So, in SuccessorML, one would write this as a disjunctive pattern:

case e of
   (B,T (R,T (R,a,x,b),y,c),z,d)
 | (B,T (R,a,x,T (R,b,y,c)),z,d)
 | (B,a,x,T (R,T (R,b,y,c),z,d))
 | (B,a,x,T (R,b,y,T (R,c,z,d))) => (* 1 *)
 | body => (* 2 *)

If compilation were to naively proceed by duplicating the body at (* 1 *) for each case, we would get the same decision tree as above. A more involved approach would attempt to achieve sharing by constructing a DAG (directed acyclic graph) in order to deduplicate the repeated rule bodies. However, naively doing this from the above tree may produce a DAG that contains redundant decisions (e.g. value and default branches that have the same destination). A possible approach would be to attempt to optimise the DAG after-the-fact and then, when emitting code for the decision tree, the shared rule body becomes its own function that receives its variables (those bound as pattern variables and free variables - similar to lambda lifting) as arguments. Then, dependent on the path followed to get there, the variables will be projected from the scrutinee and provided as arguments to the deduplicated function.

The simple approach is to identify sharing upon construction of nodes in the decision tree, by means of hash consing. This approach is described in Pettersson’s book “Compiling Natural Semantics” and the one I took for my own implementation.

Below, you can see the decision DAG (for the balance example) resulting from an implementation that uses hash consing:

An Implementation

In the time since writing this article, I’ve created a very simple implementation of the algorithm here. Along with providing a faithful implementation of the core ideas described in this article, it also provides a small user interface for playing around with seeing potential decision trees.

Afterword

In efforts to teach the core idea of the algorithm described in this blog article, I’ve found that it’s beneficial to start with simpler examples - consisting purely of tuples of, say, integers patterns (no wildcards or variable patterns). Then, once someone is familiar with the general approach (matrix specialisation), adding all the bookkeeping for pattern variables, defaulting of the matrix, deconstructing nested patterns into the matrix, etc. become reasonable extension projects for them to program around.

For example, in the OCaml compiler, pattern matching over constant strings actually follows the simpler subset of the match compilation algorithm described in this article. I hope to dedicate a future article to this - but, until then, an excellent description of the core idea has been given by Gabriel Scherer on the OCaml forum here.

Colin James