The Shelf (or TODO)

Below is a list of things I’d like to work on when I can find the time. This list is not exhaustive. Many of the entries are things I’ve worked on in the past but felt that I’d need to tidy them up before documenting their ideas on this blog.

Tree Matching using tries

DONE

Implement algorithm D from Pattern Matching in Trees, test over an S-expr based test suite. It seems likely that a similar approach could be taken for match compilation in ML languages: the trie created from root-to-leaf strings is somewhat similar to naive decision trees.

Hindley Milner with typeclasses

TODO

Implement standard HM (w/ destructive unification) but attempt to incorporate ad-hoc polymorphism by means of typeclasses. My current understanding of such an approach is sourced solely from Implementing type classes. It's more the book-keeping and edge cases that I'm unsure about.

Aho-Corasick

DONE

Implement Aho-Corasick for a blog post. The tree matching problem uses an Aho-Corasick automaton. I have already implemented Aho-Corasick a few times, but I'd like to document it as the suffix link construction stuff is rather cute. Furthermore, lots of online visualisations of this algorithm are buggy and produce incorrect automatons. Aho-Corasick is required for the tree matching item above. The real output of this is a visualisation tool.

A language with shift/reset

TODO

Implement a language with shift/reset primitives. It's unclear how best this should be done. Resources: OchCaml, Direct implementation of shift and reset in the MinCaml compiler. I'd like to fully understand the degenerate stack-copying solution (in native code). I believe Koka does a selective embedding of a multi-prompt delimited control monad when compiling to C. I've implemented shift/reset in a hacky, isolated, way before - but it required compiler tricks. The idea is to gain some background for efficient implementations of algebraic effects.

SSA Destruction

TODO

Implement SSA construction (using Cooper et al's dominator algorithm) and, more importantly, its non-conventional destruction. A few compilers (such as the MIR JIT) preserve the conventional SSA property which makes out-of-ssa trivial. However, I'd like to fully understand SSA destruction approaches, as in Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency. To refamilarise myself with SSA construction, refer to lecture slides. From memory, it's basically just phi placement (using dominators) and then renaming using stacks.

Sparse Conditional Constant Propagation

TODO

Implement SCCP over SSA. Refer to the Constant propagation with conditional branches and this useful blog article.

Monomorphisation for an ML

TODO

Reimplement monomorphisation in a nice way as to produce a blog article about it. In prior implementations, I did it over typed ANF. I think I'd like to find a nice way to store the map of let-bound variables and their instantiations (mapped to fresh, specialised, variable names).

Defunctionalisation for an ML

TODO

Using the monomorphisation implementation as a base, implement and blog about defunctionaliastion. Do this in the style of Tolmach. This requires a good normalised program format (to introduce ADTs for the closures), see Program.t from MLton. Mutual recursion, matching, etc. must also be supported for the type-directed dispatch functions.

Warren Abstract Machine

TODO

Implement a compiler for a subset of Prolog that targets WAM. Refer to this book. I've already implemented most of the first part (basically just a unification engine, compiling terms to heap representations built by WAM instructions). Adding prolog-specific stuff is really what this is about.

Radix Trie Construction

TODO

Implement construction of a radix trie using a modified version of a 3-way radix quicksort, appending nodes when the sorting diverges. Radix tries are used in Mach-O export tries for storing symbols. LLVM has code for doing this. Refer to Sedgewick's lectures, and code.