Trampolining in C

Consider the following JavaScript program that sums all the integers from $1$ to $n$:

function sum(n) {
  if (n == 0) return 0;
  return n + sum(n - 1);
}
console.log(sum(50000));

This recursive function overflows the stack for large values of $n$.

A well-known technique to constrain the stack usage of such a function and, thus, achieve the effects of tail call elimination is to perform the so-called trampolining transformation.

Trampolining is a transformation upon a CPS (continuation-passing style) program that packages up all calls (which are inherently tail calls under CPS) as thunks and returns those instead. By doing this, the entire control flow of a program can be evaluated using an evaluator that loops and evaluates consecutive thunks.

Performing the trampolining transformation on the sum function from above could yield something like the following:

function sum(n, k) {
  function go(n, k) {
    if (n == 0) return { f: k, xs: [0] }; // k(0)
	// go(n - 1, x => k(n + x))	
    return { f: go, xs: [n - 1, x => ({ f: k, xs: [n + x] })] }; 
  }
  // seed evaluator w/ user's invocation
  var th = { f: go, xs: [n, k] };
  while (th = th.f.apply(null, th.xs));
}
sum(50000, console.log);

Here, you can see that, instead of evaluating a tail call $f(x_1, x_2, \ldots, x_n)$, the function returns a thunk representing the call, { f: $f$, xs: [$x_1, x_2, \ldots, x_n$] }. This transformation means the evaluator can loop and iteratively perform the next control flow transfer (as each thunk encodes a continuation that itself encodes “what to do next”).

This technique relies heavily on two features available in JavaScript:

Higher-order functions: capturing anonymous functions are vital for this to work. The recursive call to go must pass a continuation: x => k(n + x). This anonymous function can not easily be hoisted to the top-level as it contains free variables (k and n). So, adapting this code to C means we’ll need to explicitly represent closures (a code pointer and environment providing values to the free variables) in our code.
Ability to invoke arbitrary functions with arbitrary arguments: the benefit of dynamically-typed scripting languages such as JavaScript is the ability to freely invoke any function with any arguments at runtime. The ease of doing this is relied upon by the looping thunk evaluator while (th = th.f.apply(null, th.xs));. We can do this in C if we dynamically construct call sites using a library such as dyncall.

The resultant code, in C, is provided at the GitHub gist below: This code uses talloc to avoid manually tracking memory ourselves (we just collate allocations under a shared context node and free all of it at the end).

If we run this under valgrind, we’ll see how expensive it is:

~ » valgrind ./sum              
result = 1250025000
==104185== 
==104185== HEAP SUMMARY:
==104185==     in use at exit: 0 bytes in 0 blocks
==104185==   total heap usage: 300,011 allocs, 300,011 frees, 244,010,476 bytes allocated
==104185== 
==104185== All heap blocks were freed -- no leaks are possible
==104185== 
==104185== For lists of detected and suppressed errors, rerun with: -s
==104185== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

So, we may have attained guaranteed tail call elimination but at the cost of an exorbitant amount of heap overhead. Of course, there’s overhead in the code above that could easily be replaced with returning pointers to static memory (that wouldn’t impede the functionality of the program). There’s also lots of associated cost of producing dynamic calls in C.

It’s clear that this isn’t an effective transformation in the context of C but, perhaps, it could be used as some kind of program obfuscation method as we’ve noted that we encode the entire control flow of CPS programs and their evaluation in this manner.