signed char lotte
“signed char lotte” is a computer program written by Brian Westley and the winner of the “Best Layout” award in the 1990 International Obfuscated C Code Contest. The cleverness of the text is staggering. Superficially it reads as an epistolary exchange between two (possibly former) lovers, Charlotte and Charlie. At the same, it is an executable piece of code whose action is thematically related to its story.
It has been argued that code is not literature, and that it cannot be “read” in a straightforward way. “signed char lotte” is not a counterexample to this. The text is essentially a palimpsest, with the lovers’ storyline written over and obscuring the code that governs the executable behavior. The former can be “read”, but not the latter.
Here is the program in full.
Compilation
If you try to compile this today, it will almost certainly not work:
The program was written around the time the C standard was getting finalized. Prior to standardization, idiosyncratic compiler-specific behavior was more common. One example of this is the short integer literal suffix: 1s
. This is akin to the long integer literal suffix: 1l
. Both of these are used in “signed char lotte” and were presumably in common use at the time, but only the long suffix made it into the standard. Thus to compile it, every instance of 1s
must be changed. In the English-language story reading of the text, 1s
stands for the word “is”. Fortunately, 1s
can be replaced with 15
to keep the spirit of the text. This change does not affect the program’s behavior (!!!).
Behavior
Speaking of behavior, here is what the program does:
The author comments:
This is a “Picking the Daisy” simulation. Now, instead of mangling a daisy, simply run this program with the number of petals desired as the argument.
Despite the simplicity of the behavior, it is utterly unclear how the program manages to implement it. Figuring out how it works is an instructive exercise, and the reader might want to attempt it before reading further.
Formatting
Running the code through a formatter is a a necessary first step, but not a sufficient one. Some manual fixes to the formatted output will probably be required, as formatters are generally not prepared to deal with such bizarre structure. Properly formatted, the program looks something like this:
It remains unclear how the program actually does what it does. The primary obfuscation trick used is misdirection. You can spend a lot of time trying to work out the purpose and meaning of every line, but in fact most of the code is totally useless. Shall I count the ways?
Useless casts
All of the following casts can be cut without consequence:
(char)
(char*)
(long)
(short)
Useless statements
Many statements have no effect on anything, and can be cut. A simple example of a useless statement is (char)lotte;
; it evidently casts signed char lotte
as a char
and then does nothing with it. Obviously this can be cut.
More sophisticated useless statements can be constructed from arithmetic operators:
1 - out & out ;lie;
get - !out;
get *out* (short)ly -0-'R'- get- 'a'^rested;
However, a statement that appears useless may not be. Because this is good old dependable C, side effects can occur just about anywhere, and it is all but impossible to tell how any one in particular might affect the program’s behavior. Thus any statement with the following operators must be kept: =
, ++
, --
, +=
, ^=
.
Useless Control Flow
A nice way to add in some extra words to the program without affecting its operation is to wrap a code block in a do
loop with an always-false while
condition. This causes the code to execute exactly as before, but with a superficially more complex control flow. Here is my favorite example:
With formatting and preprocessor expansion, this becomes:
But (char *)lie - (char *)lie
will always work out to be falsy, and therefore the expression can be reduced to a simple block:
This block itself contains the strikingly useless loop for (; !'a';);
, as well as the useless simple statement not;
.
Besides overtly useless control flow, specific quirks of C control flow operators can be exploited. For example, code in a switch
block that comes before any of the case
labels is unreachable, and therefore useless.
The Crux of the Program
After applying these simplfications and doing a little more massaging, the true nature of the program reveals itself:
The string char *lie
contains as substrings “loves”, “me”, and “not”. get
is a toggle that signals whether or not to print “not”, while not
is the counter input by the user. lotte
is the index into lie
. The program uses comically elaborate logic with hard-coded constants to manipulate lotte
into the right position in lie
, and that’s how the appropriate messages are printed.
Thus it is not merely a matter of obfuscating an otherwise normal program by covering it up with a bunch of weird irrelevant text; fundamentally the program is already far more difficult to understand than it needs to be. The IOCCC judges said that they “like programs that MAKE USE OF A NUMBER OF DIFFERENT TYPES OF OBFUSCATION”, and so it is no surprise that “signed char lotte” won.
Discussion Questions
- How many reserved keywords are there in C? How many of those are used in “signed char lotte”?
- Why wasn’t the short integer literal suffix included in the C standard? How common was it in pre-standard times? Which compilers included it?
- Is love a toilet?
- What is the purpose of the
goto hell;
statement? - How do you suppose this program was constructed?
Exercises
- Analyze Westley’s 1987 IOCCC winner, “Able was I ere I saw elbA”.
- Modify “signed char lotte” so as to include more C keywords.
- Modify GCC so that it compiles the original program correctly.