A common intermediate representation used in compilers is that of A-Normal Form. The key idea of this representation is that intermediary values produced by expressions become bound to variables. The effective result of the transformation is a linearisation of complex expressions which can help in subsequent lowering passes (such as to a lower, three-address code, scheme).
Example
As an example, consider the following expression:
foo(6 + 8, 10) * bar(2, id(12))
Its ANF conversion may yield the following:
let t0 = 6 + 8 in
let t1 = foo(t0, 10) in
let t2 = id(12) in
let t3 = bar(2, t2) in
let t4 = t1 * t3 in
t4
There are important points to note about the above:
- Evaluation order becomes explicit as complex expressions are linearised into their constituent parts.
- Arguments to functions become “trivial” - their evaluation halts immediately (in our case, arguments are either variables or integer literals).
The Algorithm
The first thing we require is a suitable datatype representation to work with:
(** binary operator *)
type bop = Add | Sub | Mul
(** arithmetic expressions with calls *)
type expr =
| Lit of int
| Bop of bop * expr * expr
| Call of string * expr list
(** "trivial" values/atoms *)
type value =
| Var of string
| Int of int
(** type alias *)
type var = string
(** ANF/let expressions *)
type lexpr =
| Halt of value
| LetBop of var * bop * value * value * lexpr
| LetCall of var * string * value list * lexpr
We will also need a small utility for producing fresh names:
(** fresh name with given prefix *)
let fresh =
let c = ref (-1) in
fun p ->
incr c;
p ^ string_of_int !c
In order to choose how we implement the conversion, it’s important to notice that a simple, recursive, bottom-up conversion will not suffice. It’s clear that each complex expression is bound to a fresh variable. However, where that variable is used must be nested deeper within the result.
To illustrate this, consider the literate algebraic data type representation of the example from above:
LetBop ("t0", Add, Lit 6, Lit 8,
LetCall ("t1", "foo", [Var "t0"; Lit 10],
LetCall ("t2", "id", [Lit 12],
LetCall ("t3", "bar", [Lit 2; Var "t2"],
LetBop ("t4", Mul, Var "t1", Var "t3", Halt "t4")))))
The computation of t4
uses variables t1
and t3
, which are constructed/bound at a shallower level in the structure.
In order to build this bottom-up (yet outside-in), it is convenient to embrace the functional utility of closures for building up data structures. Effectively, we must recursively propagate an embedding context which tells deeper recursive invocations how to use the values each expression yields.
We implement this idea by means of providing an embedding function (continuation) of type value -> lexpr
as supplementary argument to our primary conversion function, go
:
let rec go e k = match e with
| ...
Now, we’ll consider how to convert each case of an expr
:
- In the case of integer literals, we simply repackage up the integer value as a
value
and pass it off to the continuation:
| Lit i -> k (Int i)
- In the case of binary arithmetic expressions,
Bop (op, l, r)
, we must collect each intermediary value through nested continuations then propagate the bound variable value to the outermost continuation:
| Bop (op, l, r) ->
go l (fun l -> go r (fun r -> let t = fresh "t" in LetBop (t, op, l, r, k (Var t))))
However, this isn’t very readable so we’ll overload let*
to allow us to write this continuation code in a more direct style:
| Bop (op, l, r) ->
let ( let* ) = (@@) in
let* l = go l in
let* r = go r in
let t = fresh "t" in
LetBop (t, op, l, r, k (Var t))
-
The most complex case is that of calls,
Call (f, es)
. In the previous example, we could just propagate each resultant value as the formal parameter of the continuation provided togo
. That works fine for cases where we know how many values we must bind. In the case of calls, however, where the number of arguments could be arbitrary, we must construct a larger embedding context that collects each resultant value into a list.In order to do this, we will construct an $n$-ary embedding context of type
value list -> lexpr
and accumulate this closure by folding around a base context:
| Call (f, es) ->
let accumulate e ctx vs =
go e (fun v -> ctx (v :: vs))
in
let base vs =
let t = fresh "t" in
LetCall (t, f, List.rev vs, k (Var t))
in
List.fold_right accumulate es base []
The reversal of vs
in the base context ensures left-to-right order is preserved after the arguments are collected in reverse (to avoid use of append for each value, which is a linear time operation).
Finally, we’ll wrap go
up inside a function, anf
, that invokes go
with a top-level continuation that produces halt of the provided value:
let anf =
let rec go e k = match e with
| Lit i -> k (Int i)
| Bop (op, l, r) ->
let ( let* ) = (@@) in
let* l = go l in
let* r = go r in
let t = fresh "t" in
LetBop (t, op, l, r, k (Var t))
| Call (f, es) ->
let accumulate e ctx vs =
go e (fun v -> ctx (v :: vs))
in
let base vs =
let t = fresh "t" in
LetCall (t, f, List.rev vs, k (Var t))
in
List.fold_right accumulate es base []
in
Fun.flip go (fun v -> Halt v)
Final Program
Below, you can play around with this simple conversion algorithm by providing arithmetic expressions as input:
Enter arithmetic expression: