An evaluator for a language L is a program that takes as input an expression of L and returns an output value, possibly executing side effects along the way. In some sense the evaluator defines the meaning of the expressions of L. A compiler for L, on the other hand, is a program that takes as input an expression of L and transforms it into some other representation, typically in a lower-level language.

The most straightforward compiler would be one that simply outputs the exact sequence of steps that the evaluator takes while evaluating the expression. Most compilers, however, go a step further and optimize the output code in terms of size or speed or whatever, yielding better code than what the evaluator would have done.

The evaluator sets the standard for the language, and it is expected that the code generated by the compiler for some expression e of L will, when run, execute exactly the same side effects as evaluator would when evaluating e and will return the same result. But with increasingly arcane optimizations, it can be hard to tell that the compiler output really is the same. How can it be ensured that the compiler is correct with respect to the evaluator?

In dependently-typed languages like Coq or Idris, it’s possible to write proofs of properties like compiler correctness. If you write a proof and it compiles, then you can be assured that the property holds, assuming the correctness of the compiler (of the language in which the proof is written). The rest of this post will develop 1) a simple language, 2) an evaluator for that language, 3) a simple stack machine, 4) a compiler for the language targeting that machine, and 5) a proof that the compiler is correct with respect to the evaluator.¹ Idris is the language that will be used, because that’s the one I know.

Language

Here’s a simple language for addition. It’s pretty boring, but notice that the Plus expression is recursive.

data Expr = Const Nat | Plus Expr Expr

Evaluator

Evaluating expressions in this language is easy; just return the argument of a Const expression, and recursively evaluate the subexpressions of an Plus expression and add them together.

eval : Expr -> Nat
eval (Const k) = k
eval (Plus a b) = eval a + eval b

Here’s an example expression²:

example : Expr
example = (Plus (Plus (Const 3) (Plus (Const 4) (Const 5))) (Plus (Const 6) (Const 7)))

eval example evaluates to 25, as expected³:

λΠ> eval example
25 : Nat

Stack Machine

Here’s a simple stack machine, along with a language for “programming” it:

Stack : Type
Stack = List Nat

data Instr = Push Nat | Add

Prog : Type
Prog = List Instr

The meaning of the stack machine instructions is defined as follows. A Push instruction adds its argument to the top of the stack. An Add instruction does nothing if the stack is less than two items high; otherwise, it pops off the top two items and pushes back their sum.

exec : Instr -> Stack -> Stack
exec (Push k) s = k :: s
exec Add (i :: j :: s) = j + i :: s
exec Add s = s

Running a program against a stack consists of executing each instruction against the stack in turn.

run : Prog -> Stack -> Stack
run [] s = s
run (instr :: rest) s = run rest $ exec instr s

Here’s a compiler that converts expressions of the addition language to programs for the stack machine:

comp : Expr -> Prog
comp (Const k) = [Push k]
comp (Plus a b) = comp a ++ comp b ++ [Add]

Examples⁴:

λΠ> comp example
[Push 3, Push 4, Push 5, Add, Add, Push 6, Push 7, Add, Add] : List Instr
λΠ> run (comp example) []
[25] : List Nat

Running the program generated by comp example against the empty stack leaves the stack [25], just as eval example gave 25, so evaluating and compiling that expression give the same result.

Correctness Proof

The program up to this point could be easily converted to Haskell; indeed, parts of it are just a few characters away from being syntactically valid Haskell. But now we’re at the correctness proof, and that can only be done with dependent types. Specifically, we need to write a program to implement the following type:

compCorrect : (e : Expr) -> run (comp e) [] = [eval e]

compCorrect is a “function” that takes an expression e as an argument and returns the claim that compiling and evaluating e works out to be the same. Note that the type of the return value depends on the value of the argument, hence dependent type.⁵

Proving facts like this can be tricky. The first thing to notice is that compCorrect is an instance of a slightly more general theorem, namely that running the compilation of e against any stack, not just the empty one, is the same as evaluating e and pushing the result onto the stack.

compCorrectStack : (e : Expr) -> (s : Stack) -> run (comp e) s = [eval e] ++ s

compCorrect : (e : Expr) -> run (comp e) [] = [eval e]
compCorrect e = compCorrectStack e []

Theorems like these are typically proved using induction; that is, the problem is solved by assuming some subproblems are already solved. Adding this extra parameter provides some flexibility in using various inductive hypotheses.

The next step is case-splitting on e. The Const case is trivial, but the Plus case will take some work.

compCorrectStack (Const k) s = Refl
compCorrectStack (Plus a b) s = ?goal

?goal is the demonstrandum, and its type is

run (comp a ++ comp b ++ [Add]) s = [eval a + eval b] ++ s

An obvious first inductive hypothesis is

hypa : run (comp a) s = [eval a] ++ s
hypa = compCorrectStack a s

That is, assume that the theorem is holds for the subexpression a, that running the compilation of the subexpression a against the initial stack s is the same as evaluating a and pushing the result onto s. The same can also be assumed for running the compilation of the subexpression b against the input stack with the result of evaluating a on top:

hypb : run (comp b)
hypb = compCorrectStack b $ [eval a] ++ s

At this point the proof looks like this:

compCorrectStack : (e : Expr) -> (s : Stack) -> run (comp e) s = [eval e] ++ s
compCorrectStack (Const k) s = Refl
compCorrectStack (Plus a b) s = ?goal
  where
    hypa : run (comp a) s = [eval a] ++ s
    hypa = compCorrectStack a s

    hypb : run (comp b) $ [eval a] ++ s = [eval b] ++ [eval a] ++ s
    hypb = compCorrectStack b $ [eval a] ++ s

The problem at this point is that the types of the inductive hypotheses don’t fit into the type of ?goal; ?goal deals with the whole program, while the hypotheses deal with fragments of it. This suggests a lemma for running a program partially:

runPartial : (p, q : Prog) -> (s : Stack) -> run (p ++ q) s = run q (run p s)
runPartial [] q s = Refl
runPartial (instr :: rest) q s = runPartial rest q $ exec instr s

That is, running program p ++ q against stack s is the same as running program q against the stack that results from running program p against s. The program on the left side of ?goal can be split into comp a and comp b ++ [Add].

runa : run (comp a ++ (comp b ++ [Add])) s = run (comp b ++ [Add]) (run (comp a) s)
runa = runPartial (comp a) (comp b ++ [Add]) s

?goal has the same left side as runa, so it can be rewritten:

compCorrectStack (Plus a b) s =
  rewrite runa in
    ?goal

The new type of ?goal is

run (comp b ++ [Add]) (run (comp a) s) = [eval a + eval b] ++ s

The program on the left side of ?goal can be split up further:

runb : run (comp b ++ [Add]) (run (comp a) s) = run [Add] (run (comp b) (run (comp a) s))
runb = runPartial (comp b) [Add] (run (comp a) s)

Again, ?goal has the same left side as runb, so it can be rewritten:

compCorrectStack (Plus a b) s =
  rewrite runa in
    rewrite runb in
      ?goal

The new type of ?goal is

run [Add] (run (comp b) (run (comp a) s)) = [eval a + eval b] ++ s

run (comp a) s is on the left side of hypa, so the right side can be substituted in:

compCorrectStack (Plus a b) s =
  rewrite runa in
    rewrite runb in
      rewrite hypa in
        ?goal

The new type of ?goal is

run [Add] (run (comp b) ([eval a] ++ s)) = [eval a + eval b] ++ s

Agains, run (comp b) ([eval a] ++ s) is on the left side of hypb, so the right side can be swapped in:

compCorrectStack (Plus a b) s =
  rewrite runa in
    rewrite runb in
      rewrite hypa in
        rewrite hypb in
          ?goal

The type of ?goal is now [eval a + eval b] ++ s = [eval a + eval b] ++ s, which can be finished off with a simple Refl.

With all the junk variables cleaned up, the whole proof looks like this:

compCorrectStack : (e : Expr) -> (s : Stack) -> run (comp e) s = [eval e] ++ s
compCorrectStack (Const k) s = Refl
compCorrectStack (Plus a b) s =
  rewrite runPartial (comp a) (comp b ++ [Add]) s in
    rewrite runPartial (comp b) [Add] (run (comp a) s) in
      rewrite compCorrectStack a s in
        rewrite compCorrectStack b ([eval a] ++ s) in
          Refl

Exercises

Add multiplication to the language.
Suppose the stack machine is designed in such a way that it’s cheaper to execute all Add instructions together, rather than in separate blocks. Write a new compiler for the addition language to make this optimization. Prove its equivalence to the evaluator, or to the old compiler.
Figure out Jekyll syntax highlighting for Idris.

Footnotes

¹ I got this example from a list of induction exercises in Coq. That author in turn got it from someone else, and apparently it ultimately goes back to John McCarthy. It seems to be a pretty common toy problem; see, for instance, this post, which I found after this one was already nearly finished.

² Writing out these expressions by hand is a pain. This example was generated with Elisp:

(defun comp (expr)
  (mapcar
   (lambda (x)
     (cond
      ((eq x '+) 'Plus)
      ((numberp x) `(Const ,x))
      (t (comp x))))
   expr))

(comp '(+ (+ 3 (+ 4 5)) (+ 6 7)))

³ Actually, we don’t even need to “run” the code. With dependent types, it’s possible to “prove” simple test cases at compile time:

evalTest : eval example = 25
evalTest = Refl

These functions are pure, so if the input is known at compile time, the compiler can unwind the functions down as far as it needs to verify the assertion.

⁴ As with the evaluator, we can run this “test” at compile time:

compTest : run (comp example) [] = [25]
compTest = Refl

⁵ Dependent types are distinct from parameterized types like Vec<T>.