A Correctness Proof for a Simple Compiler in Idris
An evaluator for a language L is a program that takes as input an expression of L and returns an output value, possibly executing side effects along the way. In some sense the evaluator defines the meaning of the expressions of L. A compiler for L, on the other hand, is a program that takes as input an expression of L and transforms it into some other representation, typically in a lower-level language.
The most straightforward compiler would be one that simply outputs the exact sequence of steps that the evaluator takes while evaluating the expression. Most compilers, however, go a step further and optimize the output code in terms of size or speed or whatever, yielding better code than what the evaluator would have done.
The evaluator sets the standard for the language, and it is expected that the code generated by the compiler for some expression e of L will, when run, execute exactly the same side effects as evaluator would when evaluating e and will return the same result. But with increasingly arcane optimizations, it can be hard to tell that the compiler output really is the same. How can it be ensured that the compiler is correct with respect to the evaluator?
In dependently-typed languages like Coq or Idris, it’s possible to write proofs of properties like compiler correctness. If you write a proof and it compiles, then you can be assured that the property holds, assuming the correctness of the compiler (of the language in which the proof is written). The rest of this post will develop 1) a simple language, 2) an evaluator for that language, 3) a simple stack machine, 4) a compiler for the language targeting that machine, and 5) a proof that the compiler is correct with respect to the evaluator.1 Idris is the language that will be used, because that’s the one I know.
Language
Here’s a simple language for addition. It’s pretty boring, but notice that the Plus
expression is recursive.
Evaluator
Evaluating expressions in this language is easy; just return the argument of a Const
expression, and recursively evaluate the subexpressions of an Plus
expression and add them together.
Here’s an example expression2:
eval example
evaluates to 25, as expected3:
Stack Machine
Here’s a simple stack machine, along with a language for “programming” it:
The meaning of the stack machine instructions is defined as follows. A Push
instruction adds its argument to the top of the stack. An Add
instruction does nothing if the stack is less than two items high; otherwise, it pops off the top two items and pushes back their sum.
Running a program against a stack consists of executing each instruction against the stack in turn.
Here’s a compiler that converts expressions of the addition language to programs for the stack machine:
Examples4:
Running the program generated by comp example
against the empty stack leaves the stack [25]
, just as eval example
gave 25
, so evaluating and compiling that expression give the same result.
Correctness Proof
The program up to this point could be easily converted to Haskell; indeed, parts of it are just a few characters away from being syntactically valid Haskell. But now we’re at the correctness proof, and that can only be done with dependent types. Specifically, we need to write a program to implement the following type:
compCorrect
is a “function” that takes an expression e
as an argument and returns the claim that compiling and evaluating e
works out to be the same. Note that the type of the return value depends on the value of the argument, hence dependent type.5
Proving facts like this can be tricky. The first thing to notice is that compCorrect
is an instance of a slightly more general theorem, namely that running the compilation of e
against any stack, not just the empty one, is the same as evaluating e
and pushing the result onto the stack.
Theorems like these are typically proved using induction; that is, the problem is solved by assuming some subproblems are already solved. Adding this extra parameter provides some flexibility in using various inductive hypotheses.
The next step is case-splitting on e
. The Const
case is trivial, but the Plus
case will take some work.
?goal
is the demonstrandum, and its type is
An obvious first inductive hypothesis is
That is, assume that the theorem is holds for the subexpression a
, that running the compilation of the subexpression a
against the initial stack s
is the same as evaluating a
and pushing the result onto s
. The same can also be assumed for running the compilation of the subexpression b
against the input stack with the result of evaluating a
on top:
At this point the proof looks like this:
The problem at this point is that the types of the inductive hypotheses don’t fit into the type of ?goal
; ?goal
deals with the whole program, while the hypotheses deal with fragments of it. This suggests a lemma for running a program partially:
That is, running program p ++ q
against stack s
is the same as running program q
against the stack that results from running program p
against s
. The program on the left side of ?goal
can be split into comp a
and comp b ++ [Add]
.
?goal
has the same left side as runa
, so it can be rewritten:
The new type of ?goal
is
The program on the left side of ?goal
can be split up further:
Again, ?goal
has the same left side as runb
, so it can be rewritten:
The new type of ?goal
is
run (comp a) s
is on the left side of hypa
, so the right side can be substituted in:
The new type of ?goal
is
Agains, run (comp b) ([eval a] ++ s)
is on the left side of hypb
, so the right side can be swapped in:
The type of ?goal
is now [eval a + eval b] ++ s = [eval a + eval b] ++ s
, which can be finished off with a simple Refl
.
With all the junk variables cleaned up, the whole proof looks like this:
Exercises
- Add multiplication to the language.
- Suppose the stack machine is designed in such a way that it’s cheaper to execute all
Add
instructions together, rather than in separate blocks. Write a new compiler for the addition language to make this optimization. Prove its equivalence to the evaluator, or to the old compiler. - Figure out Jekyll syntax highlighting for Idris.
Footnotes
1 I got this example from a list of induction exercises in Coq. That author in turn got it from someone else, and apparently it ultimately goes back to John McCarthy. It seems to be a pretty common toy problem; see, for instance, this post, which I found after this one was already nearly finished.
2 Writing out these expressions by hand is a pain. This example was generated with Elisp:
3 Actually, we don’t even need to “run” the code. With dependent types, it’s possible to “prove” simple test cases at compile time:
These functions are pure, so if the input is known at compile time, the compiler can unwind the functions down as far as it needs to verify the assertion.
4 As with the evaluator, we can run this “test” at compile time:
5 Dependent types are distinct from parameterized types like Vec<T>
.