Introduction

This blog article aims to serve as an introductory resource for those interested in some of the basic ideas underlying compilation strategies for functional languages. It expands upon a StackOverflow answer I wrote a while ago. The hope is that this article may provide a basic high-level overview of a few core transformations that may be performed. For those interested in going beyond the content presented here, I have linked several resources in the final section of this article.

Lambda Calculus

The source language of our compiler will be that of the untyped lambda calculus (extended with conditional and arithmetic expressions). We will encode this in OCaml as follows:

module Lam = struct
  type t =
    | Int of int
    | Var of var
    | Lam of var * t
    | App of t * t
    | Bop of Bop.t * t * t
    | If of t * t * t
end

Alpha Renaming

The general first step in compilation is to perform a form of $\alpha$-conversion over the source representation, assigning unique names to each variable.

alpha renaming

The purpose of this transformation is to assign unique names to each variable in order to aid with potential optimisations that rely on code motion, such as $\beta$-reduction - in order to be a safe transformation, care must be taken to avoid capturing substitutions (the alternative is renaming on-the-fly). Additionally, unique renaming contributes to debuggability by making lexical scope apparent.

We require a freshness source in a few places in our compiler, renaming included, so we define a simple one as follows:

let fresh =
  let c = ref (-1) in
  fun p ->
    incr c;
    p ^ string_of_int !c

The code for performing this transformation is a straightforward recursive transformation over the source representation, maintaining a map of fresh names as we recurse.

(** Alpha Renaming *)
module Alpha = struct
  module SM = Map.Make(String)
  let rename =
    let rec go env : Lam.t -> Lam.t = function
      | Var x ->
        (match SM.find_opt x env with
         | Some x' -> Var x'
         | _ -> failwith "Scope checking failed!")
      | Lam (x, t) ->
        let x' = fresh x in
        Lam (x', go (SM.add x x' env) t)
      | App (f, x) -> App (go env f, go env x)
      | Bop (op, l, r) -> Bop (op, go env l, go env r)
      | If (e, t, f) ->
        If (go env e, go env t, go env f)
      | Int _ as t -> t
    in go SM.empty
end

A-Normalisation

To aid with further transformations, compilers for functional languages often opt to transform the source language into a more manageable intermediary representation. Common examples of such representations are A-Normal Form and CPS.

ANF and CPS

Our compiler will use a variant of ANF (A-Normal Form). In practice, there is no canonical intermediary representation that suits all requirements. Therefore, almost all instances of these representations in the wild are extended on an ad-hoc basis.

These intermediary representations impose some important properties that are useful for compiler writers:

  • All intermediary computations are given (fresh) names - in our case this means that our

  • Evaluation order becomes fixed - the linearisation of expressions consisting of multiple operands requires that an order in which to evaluate each operand is chosen.

  • All arguments to functions/operators are atomic - they are either variables or constants. This corresponds more closely to the operands of machine instructions, which are usually either physical registers or immediates.

This transformation is done as a recursive algorithm that transforms the source Lam.t representation into the Anf.t representation below:

module Anf = struct
  (** atoms *)
  type value =
    | Int of int
    | Var of var
    | Glob of var

  (** ANF representation *)
  type t =
    | Halt of value
    | Fun of var * var list * t * t
    | Join of var * var option * t * t
    | Jump of var * value option
    | App of var * var * value list * t
    | Bop of var * Bop.t * value * value * t
    | If of value * t * t
    | Tuple of var * value list * t
    | Proj of var * var * int * t
(* ... *)

The meaning of each of these constructors will hopefully be explained by the following explanation of each case of the algorithm. The conversion algorithm itself is a recursive function that carries an embedding function in the form of a continuation that informs each normalised value as to the context in which it must appear:

let convert =
  let rec go (e : Lam.t) (k : value -> t) : t = match e with
  (* ... cases ... *)

In the case of variables and integer constants, their values are are repackaged as ANF values and passed off to k, which will determine where they appear:

| Int i -> k (Int i)
| Var x -> k (Var x)

In order to convert lambda abstractions, a fresh name is generated for the ANF function we will create. The body of this function will be created by normalising the lambda abstraction’s body, with k supplied as a function that produces Halt of its provided value.

We define an auxilliary function, mk_halt, for this purpose:

(* ... *)
let mk_halt v = Halt v in
(* ...  *)

Then the transformation proceeds as described, where the (fresh) name of the function, f, is provided to the context in which it appears via k:

| Lam (x, t) ->
  let f = fresh "f" in
  let t = go t mk_halt in
  Fun (f, [x], t, k (Var f))

In order to succinctly normalise terms that may depend on the context of normalising another term, we introduce a utility let operator [?], let*, that allows our code to appear in direct style (despite being in continuation-passing style):

(* ... *)
let ( let* ) = (@@) in
(* ... *)

The normalisation of application benefits from this operator (which would otherwise be written as go f (fun f -> go x (fun x -> ...))):

| App (f, x) ->
  let* f = go f in
  let* x = go x in
  (match f with
   | Var f ->
     let r = fresh "r" in
     App (r, f, [x], k (Var r))
   | _ -> failwith "Must apply named value!")

Application requires that the left-hand side normalises to a named value. The result of the application is given a fresh name which is then propagated to the outer context provided by k. Normalisation of binary arithmetic operators works in a similar manner:

| Bop (op, x, y) ->
  let* x = go x in
  let* y = go y in
  let r = fresh "r" in
  Bop (r, op, x, y, k (Var r))

The most involved case is that of If expressions. In order to provide the correct then/else value to the external context without duplicating the external context into each branch of the If, we introduce a join point:

| If (e, t, f) ->
  let* e = go e in
  let j, p = fresh "j", fresh "p" in
  let join v = Jump (j, Some v) in
  Join (j, Some p, k (Var p),
    If (e, go t join, go f join))

A “join point” is where control flow merges together. Conceptually, it’s a kind of local continuation that’s used to encode branching control flow in the tree-based structure of ANF. Unlike in the case of abstractions, where the body of the abstraction is normalised with a k that produces Halt (which returns from the current function), each case of the if-expression is normalised such that their values Jump to the same join point (providing their normalised value as a parameter).

join points

As we’ll see in the lowering stage, it is important to note that these join points are not actually functions. When constructing a control flow graph, these join points, which once dominated over the context in which they’re used, will become the targets of branching instructions.

Finally, the conversion function, go, is seeded with an initial k that produces Halt of the normalised result of the program:

in Fun.flip go mk_halt

I have presented a similar algorithm for ANF conversion in another article that you can read here.

Closure Conversion

In order to compile higher-order functions that may reference variables outwith their own lexical scope (such variables are referred to as “free” variables), we require a transformation that provides a means by which the functions can access their free variables irrespective of their context (as it may be freely passed around and returned by other functions). This process is known as “closure conversion” because it ensures all functions are “closed” (have no free variables).

To illustrate the problem, consider the following ANF term:

example requiring closure

A straightforward strategy for closure conversion is to rewrite functions such that they obtain their free variables through an additional parameter known as the environment. Then, when a function is used a higher-order way, we package up the function (as a code pointer) with its environment and pass that around (this combined structure of a code pointer and an environment is known as the “closure”). Additionally, since occurrences of unknown functions are now referring to closure objects, we must rewrite applications to extract and apply the function correctly.

The algorithm requires an auxiliary function that computes the free variables of ANF terms. Such an algorithm is straightforward and so is not listed here. The closure conversion algorithm is described below:

let convert =
  let rec go : Anf.t -> Anf.t = function

The case of functions is the most involved:

| Fun (f, xs, e, e') ->
  let env = fresh "env" in
  let fvs = VS.(elements (diff (fvs e) (of_list xs))) in
  (* let x = env[i] in ... e *)
  let proj (e, i) x =
    (Proj (x, env, i, e), i+1)
  in
  let e =
    fst (List.fold_left proj (go e, 1) fvs)
  in
  (* let f = (@f, fvs...) in e' *)
  let e' =
    let vs = List.map (fun v -> Var v) fvs in
    Tuple (f, Glob f :: vs, go e')
  in
  Fun (f, env :: xs, e, e')

Upon encountering a function, let f(xs...) = e in e' we do the following:

  • Compute the free variables of the f’s body, e (ensuring to remove f’s parameters, xs...).
  • Generate a fresh name for the environment, called env.
  • Rewrite e such that it sources its free variables using projections from the environment variable we created. The indices of the free variables start from 1 because the 0th component will be our code pointer. We rely on shadowing to avoid having to substitute all uses of free variables with projections.
  • Rewrite e' to shadow f by constructing a tuple named f whose components are the code pointer for f, followed by f’s free variables. The f that now reaches code in e' refers to this closure object.
  • Prepend env to f’s parameter list.

In the case of applications, we must assume that the variable being applied is actually a closure object and rewrite it accordingly:

| App (r, f, vs, e) ->
  let ptr = fresh f in
  Proj (ptr, f, 0, App (r, ptr, Var f :: vs, go e))

This rewrites let r = f(vs...) in e to become let ptr = f[0] in let r = ptr(f, vs...) in e - we extract out the code pointer and apply that, passing the closure object as the first parameter. This style of passing a combined code pointer and environment to the target function is referred to as “closure passing style” (as opposed to “environment passing style” where the environment may be indirectly linked from the closure object and passed on its own).

The remaining cases just apply the conversion algorithm recursively.

example requiring closure

As you can see from the above example of before and after closure conversion, unnecessary closures are introduced for known functions. In real compilers, a necessary optimisation is that of closure elimination which can remove unnecessary closures and readily optimise chains of applications to curried functions (an optimisation known as uncurrying - which aims to remove the overhead of intermediary closures, which which have significant cost in our “flat” closure representation due to lack of sharing of free variables). I have included some notes about these in the final section of this article.

Hoisting

Once closure conversion has been performed, all the functions are closed. We can then safely hoist all the nested functions to the top-level. The purpose of hoisting is to remove nesting of functions to aid in the eventual lowering to machine code.

The hoisting algorithm works recursively, collecting functions and join points.

Since we’re recursively appending lists together, to avoid quadratic blowup, we make use of difference lists:

module Hoist = struct
  type join = var * var option * Anf.t
  type func = var * var list * join * join list
  type 'a dlist = 'a list -> 'a list
  let cons x xs = x :: xs
  let (@-) xs ys tl = xs (ys tl)
  let hoist e : func list =
    let rec go : Anf.t -> func dlist * join dlist * Anf.t = function
    (* ... cases ... *)

In order to hoist let f(xs...) = e in e', we collect f as being a list of joins collected from its body, e, along with a designated entry block (that’s the body of the function without nested functions or joins).

| Fun (f, xs, e, e') ->
  let (fs, js, e) = go e in
  let (fs', js', e') = go e' in
  let entry = fresh "entry" in
  let fn = (f, xs, (entry, None, e), js []) in
  (fs @- fs' @- cons fn, js', e')

For joins, we simply collect the join:

| Join (j, p, e, e') ->
  let (fs, js, e) = go e in
  let (fs', js', e') = go e' in
  let jn = (j, p, e) in
  (fs @- fs', cons jn @- js @- js', e')

In the case of if-expressions, two fresh joins are collected; corresponding to the bodies of the then and else branches. The if-expression is then rewritten to immediately jump to the corresponding blocks/joins:

| If (e, t, f) ->
  let (fs, js, t) = go t in
  let (fs', js', f) = go f in
  let th, el = fresh "then", fresh "else" in
  let bt, bf = (th, None, t), (el, None, f) in
  (fs @- fs', cons bt @- cons bf @- js @- js',
     If (e, Jump (th, None), Jump (el, None)))

The remaining cases are similar to the above but don’t create any new joins or functions. They’re omitted here for brevity.

An example of before and after hoisting can be seen below:

before and after hoisting

The importance of rewriting Ifs to immediately branch becomes apparent when you consider that they now correspond to conditional branches in basic block form (which appear as terminating instructions, i.e. at the end of a block):

conditional control flow graph

Lowering to LLVM

Now that our lambda calculus program is a bunch of top-level functions defined as a collection of straightline blocks, we can lower them into LLVM IR. To do this, I’ve opted to adapt a subset of LLVM IR used by a course at the University of Pennsylvania called LLVMlite.

The representation I’ve chosen is rather hacky in that most values will be represented as i64 values, being casted to pointers when necessary. The purpose in doing this is to avoid a larger implementation that requires more bookkeeping. However, it does mean that the IR we emit will contain a lot of casts between i64s and pointers. To facilitate this requirement in LLVMlite, I’ve added support for 2 LLVM instructions, inttoptr and ptrtoint. I have previously used bitcast in place of these but these instruction seem to work just as well for our purposes. Each of these instructions is documented in the LLVM language reference.

(* ll.ml *)
type insn = 
  (* ... *)
  | PtrToInt of ty * operand * ty (* ptrtoint ty* %u, i<N> *)
  | IntToPtr of ty * operand * ty (* inttoptr i<N> %u, ty* *)

To avoid complications further, the allocation of tuples is performed using malloc (with no garbage collection). As noted above, since most values are i64s, the representation of tuples is uniform. There are many different representations for heap allocated objects in runtimes for functional languages. For example, OCaml uses a similar representation in order to make garbage collection simpler to implement and so that functions that are parametrically polymorphic truly compile down to a single function (details about how OCaml memory layout can be read about here).

closure layout

In order to lower code that jumps to a join point with a given argument, we perform a preprocessing pass that associates a fresh name for each join point that accepts an argument. This fresh name will be used as the name of an alloca slot introduced in the entry block of the function: blocks wishing to jump with an argument will write the argument to the same slot (using store) and the destination block will load its value:

module Lower = struct
  module SM = Map.Make(String)
  (** record spill slots for each argument-accepting join point  *)
  let preprocess =
    let go spills = function
      | (j, Some p, _) ->
        SM.add j (fresh p) spills 
      | _ -> spills
    in
    List.fold_left go SM.empty
LLVM for branching with arguments

Doing this is necessary for preserving SSA (static single assignment) form that LLVM IR requires. Alternatively, we could emit an instruction that has the effect of moving the jump argument into a fresh variable in each predecessor block, which is then merged at the join point using a phi instruction. However, LLVM’s mem2reg pass will be able to do this for us - emitting alloca code that relies on this transformation is considered idiomatic when producing front-ends for LLVM (it’s the same technique that clang uses for mutable variables).

The code for lowering each block in a function is provided by lower_block:

let lower_block spills (j, p, e) : var * Ll.block =
  let open Ll in
  let last : Ll.terminator ref = ref (Br "undefined") in
  let rec go : Anf.t -> (var * Ll.insn) list = function
  (* cases *)

The output of lower_block is a pair consisting of a block and its name (as this is how LLVMlite’s API is structured). Since normal LLVM instructions are stored as a different type from terminating instructions, single termination instructions are collected through a reference. Therefore, instructions that can only appear at the end of straightline ANF yield an empty list of instructions but set the corresponding last instruction for the block:

| Halt v ->
  last := Ret (i64, Some (lower_value v)); []
| Jump (j, None) ->
  last := Br j; []

The case for a Jump with an argument is similar but stores its argument into the pre-recorded slot for its containing block (as described above) before branching:

| Jump (j, Some v) ->
  (match SM.find_opt j spills with
   | Some slot ->
     last := Br j;
     ["", Store (i64, lower_value v, Id slot)]
   | _ -> failwith "Join points must have spill slots!")

Lowering of a binary arithmetic operator produces a direct mapping to the relevant LLVM IR instruction, using lower_bop to map between:

| Bop (r, op, x, y, e) ->
  let op = lower_bop op in
  let x, y = lower_value x, lower_value y in
  (r, Binop (op, i64, x, y)) :: go e

Compilation of if-expressions emits a conditional branch dependent on whether the evaluated guard of the If is (signed) greater than 0:

| If (v, Jump (t, None), Jump (f, None)) ->
  let cmp = fresh "c" in
  last := Cbr (Id cmp, t, f);
  [cmp, Icmp (Sgt, i64, lower_value v, Const zero)]

In order to compile application, each argument is is lowered using lower_value and the function pointer (projected from the function’s closure) is cast to a pointer of the relevant type fty (i64 (i64, i64)) and called:

| App (r, ptr, vs, e) ->
  let arg op = match op with 
    | Const _ | Id _ -> I64, op
    | _ -> failwith "Invalid argument" 
  in
  let vs = List.map (lower_value >> arg) vs in
  let casted = fresh "c" in
  let cast = casted, IntToPtr (i64, Id ptr, Ptr (Fun fty)) in
  cast :: (r, Call (i64, Id casted, vs)) :: go e

The most involved case is that of Tuples, which must first allocate space on the heap using malloc and then populate each index with a lowered component. If the lowered value refers to a global identifier, then it must be a top-level code pointer (so it is cast from i64 (i64, i64)* to i64 in order to store it in the tuple).

| Tuple (r, vs, e) ->
  let ptr = fresh "ptr" in
  let size = List.length vs * 8 in
  let alloc =
    ptr, Call (Ptr i64, Gid "malloc", [I64, const_i64 size])
  in
  let populate index op = 
    let off = fresh "off" in
    let gep = off, Gep (i64, Ptr i64, Id ptr, [const_i64 index]) in
    let store x = "", Store (i64, Id x, Id off) in
    match op with
    | Gid g ->
      (* gep, ptrtoint, store *)
      let casted = fresh "casted" in
      let cast = casted, PtrToInt (Ptr (Fun fty), Gid g, i64) in
      [gep; cast; store casted]
    | Id x -> [gep; store x]
    | _ -> failwith "Constants or null cannot reach here since all free variables are named"
  in
  let stores = List.mapi (fun i -> lower_value >> populate i) vs in
  alloc :: (r, PtrToInt (Ptr i64, Id ptr, i64)) :: List.concat stores @ go e

Compilation of projections casts the tuple to a i64* and then indexes it using getelementptr to compute each component’s offset. Each component is then loaded from its corresponding, computed, address.

| Proj (r, x, i, e) ->
  let addr, tuple = fresh "a", fresh "t" in
    [
      tuple, IntToPtr (i64, Id x, Ptr i64);
      addr, Gep (i64, Ptr i64, Id tuple, [const_i64 i]);
      r, Load (i64, Id addr)
    ]
  @ go e

The remaining cases must not occur because functions and ANF join points were removed through hoisting and If were all rewritten to immediately branch to their bodies:

  | Fun _ | Join _ -> failwith "Code must be straightline!"
  | If _ -> failwith "Ifs must immediately branch!"

Finally, if the containing block/join receives an argument, p, then we prepend a load into p from the pre-assigned spill slot for it:

let insns = match p with
  | Some p ->
    let slot = SM.find j spills in
    (p, Load (i64, Id slot)) :: go e
  | _ -> go e
in
j, { insns; terminator = !last }

Demo

You can play around with this compiler below. The default program computes the factorial of 5 using the Z combinator.

In order to test that it works, paste the output into main.ll, then do:

llc main.ll && gcc main.s -o main && ./main; echo $?

You will need to adapt the compiler to emit a printf for the result to see values larger than the exit code threshold (as it is masked on most implementations).

Input:

Notes

TODO