# A Correctness Proof for a Simple Compiler in Idris

An **evaluator** for a language *L* is a program that takes as input an expression of *L* and returns an output value, possibly executing side effects along the way. In some sense the evaluator defines the meaning of the expressions of *L*. A **compiler** for *L*, on the other hand, is a program that takes as input an expression of *L* and transforms it into some other representation, typically in a lower-level language.

The most **straightforward** compiler would be one that simply outputs the exact sequence of steps that the evaluator takes while evaluating the expression. Most compilers, however, go a step further and **optimize** the output code in terms of size or speed or whatever, yielding better code than what the evaluator would have done.

The evaluator sets the standard for the language, and it is expected that the code generated by the compiler for some expression *e* of *L* will, when run, execute exactly the same side effects as evaluator would when evaluating *e* and will return the same result. But with increasingly arcane optimizations, it can be hard to tell that the compiler output really is the same. **How can it be ensured that the compiler is correct with respect to the evaluator?**

In **dependently-typed** languages like **Coq** or **Idris**, it’s possible to write **proofs** of properties like compiler correctness. If you write a proof and it compiles, then you can be assured that the property holds, assuming the correctness of the compiler (of the language in which the proof is written). The rest of this post will develop 1) a simple language, 2) an evaluator for that language, 3) a simple stack machine, 4) a compiler for the language targeting that machine, and 5) a proof that the compiler is correct with respect to the evaluator.^{1} Idris is the language that will be used, because that’s the one I know.

# Language

Here’s a simple language for addition. It’s pretty boring, but notice that the `Plus`

expression is recursive.

# Evaluator

Evaluating expressions in this language is easy; just return the argument of a `Const`

expression, and recursively evaluate the subexpressions of an `Plus`

expression and add them together.

Here’s an example expression^{2}:

`eval example`

evaluates to 25, as expected^{3}:

# Stack Machine

Here’s a simple stack machine, along with a language for “programming” it:

The meaning of the stack machine instructions is defined as follows. A `Push`

instruction adds its argument to the top of the stack. An `Add`

instruction does nothing if the stack is less than two items high; otherwise, it pops off the top two items and pushes back their sum.

Running a program against a stack consists of executing each instruction against the stack in turn.

Here’s a compiler that converts expressions of the addition language to programs for the stack machine:

Examples^{4}:

Running the program generated by `comp example`

against the empty stack leaves the stack `[25]`

, just as `eval example`

gave `25`

, so evaluating and compiling that expression give the same result.

# Correctness Proof

The program up to this point could be easily converted to **Haskell**; indeed, parts of it are just a few characters away from being syntactically valid Haskell. But now we’re at the correctness proof, and that can only be done with dependent types. Specifically, we need to write a program to implement the following type:

`compCorrect`

is a “function” that takes an expression `e`

as an argument and returns the claim that compiling and evaluating `e`

works out to be the same. Note that **the type of the return value depends on the value of the argument**, hence

*dependent type*.

^{5}

Proving facts like this can be tricky. The first thing to notice is that `compCorrect`

is an instance of a slightly more general theorem, namely that running the compilation of `e`

against *any* stack, not just the empty one, is the same as evaluating `e`

and pushing the result onto the stack.

Theorems like these are typically proved using **induction**; that is, the problem is solved by assuming some **subproblems** are already solved. Adding this extra parameter provides some flexibility in using various **inductive hypotheses**.

The next step is case-splitting on `e`

. The `Const`

case is trivial, but the `Plus`

case will take some work.

`?goal`

is the demonstrandum, and its type is

An obvious first inductive hypothesis is

That is, assume that the theorem is holds for the subexpression `a`

, that running the compilation of the subexpression `a`

against the initial stack `s`

is the same as evaluating `a`

and pushing the result onto `s`

. The same can also be assumed for running the compilation of the subexpression `b`

against the input stack with the result of evaluating `a`

on top:

At this point the proof looks like this:

The problem at this point is that the types of the inductive hypotheses don’t fit into the type of `?goal`

; `?goal`

deals with the whole program, while the hypotheses deal with fragments of it. This suggests a lemma for running a program partially:

That is, running program `p ++ q`

against stack `s`

is the same as running program `q`

against the stack that results from running program `p`

against `s`

. The program on the left side of `?goal`

can be split into `comp a`

and `comp b ++ [Add]`

.

`?goal`

has the same left side as `runa`

, so it can be rewritten:

The new type of `?goal`

is

The program on the left side of `?goal`

can be split up further:

Again, `?goal`

has the same left side as `runb`

, so it can be rewritten:

The new type of `?goal`

is

`run (comp a) s`

is on the left side of `hypa`

, so the right side can be substituted in:

The new type of `?goal`

is

Agains, `run (comp b) ([eval a] ++ s)`

is on the left side of `hypb`

, so the right side can be swapped in:

The type of `?goal`

is now `[eval a + eval b] ++ s = [eval a + eval b] ++ s`

, which can be finished off with a simple `Refl`

.

With all the junk variables cleaned up, the whole proof looks like this:

# Exercises

- Add
**multiplication**to the language. - Suppose the stack machine is designed in such a way that it’s cheaper to
**execute all**, rather than in separate blocks. Write a new compiler for the addition language to make this optimization. Prove its equivalence to the evaluator, or to the old compiler.`Add`

instructions together - Figure out
**Jekyll syntax highlighting**for Idris.

# Footnotes

^{1} I got this example from a list of induction exercises in Coq. That author in turn got it from someone else, and apparently it ultimately goes back to **John McCarthy**. It seems to be a pretty common **toy problem**; see, for instance, this post, which I found after this one was already nearly finished.

^{2} Writing out these expressions by hand is a pain. This example was generated with **Elisp**:

^{3} Actually, we don’t even need to “run” the code. With dependent types, it’s possible to “prove” simple test cases at compile time:

These functions are pure, so if the input is known at compile time, the compiler can unwind the functions down as far as it needs to verify the assertion.

^{4} As with the evaluator, we can run this “test” at compile time:

^{5} *Dependent types* are distinct from ** parameterized types** like

`Vec<T>`

.