Clue: an ANSI C compiler targeting high level languages

New! v0.6 released! Now supported: LuaJIT2. Some benchmarks are actually faster than native code!

What?

Clue is an ANSI C compiler (C89, some C99) that targets high-level languages such as Lua, Javascript or Perl (and some low-level ones). It supports the entire C language, including pointer arithmetic, and can be used to run arbitrary pure-C programs.

Clue currently supports the following targets:

Why?

What do you mean, 'why'?

Apart from pure hack value (I'm hoping at some point to produce a back end that will emit sh script --- just because), Clue is mainly an experiment into the use of dynamic VMs to run static code. Modern JITs can do an astonishing job of producing machine code from dynamic languages, gathering all the necessary type information just from watching the program run. It therefore seems instructive to try taking a statically typed language like C, discarding all the type information, and letting the JIT have a go.

In terms of actual practical value, it may be useful to allow the use of code written for one system to run on another, much more restricted system. For example, using clue you could use off-the-shelf encryption systems like gpg to work inside a web browser.

How well does it work? Well, let's have some numbers. (All these were calculated during a single benchmarking run on my machine. The gcc score is included for reference. The gcc version of the benchmark uses the same source code as the Clue versions.)

BackendInterpreterWhetstone scorePerformance relative to gcc
(gcc)2500100%
luaLuaJIT 2.0.12500100%
cgcc240096%
javaSun Java 679032%
luaLuaJIT 2.0.1 (interpreter)1556.2%
jsnode.js (V8) 0.6.191104.4%
luaLua 5.2.1843.4%
luaLua 5.1.3291.2%
perl5Perl 52.70.11%

Notes:

What's (gcc)? This is the test program compiled and run directly by gcc, without Clue being involved. This gives us a reference point to compare the benchmarks with.

What's the 'c' target? That's C code emitted by Clue. That is, we're compiling C into C. Clue's output code uses double precision floats for all numbers, but even then it's impressively fast.

Why is Lua 5.2 so much faster than Lua 5.1? Lua 5.2 supports a new goto keyword. This is incredibly useful when doing this kind of compilation as it allows me to pass execution directly from basic block to basic block. Lua 5.1 doesn't have this, which means I have to fake goto using what boils down to a switch statement. This is much less efficient.

Why isn't Common Lisp on that list? Because Clue's libc for Common Lisp isn't up to it yet. I don't know Lisp; anyone want to volunteer?

Holy cow! LuaJIT is faster than C! Well, not really. These figures all come from the Whetstone benchmark, which is a synthetic benchmark that's not indicative of anything much. What's more, the figures above are a composite of several different subbenchmarks. LuaJIT is really, really good at optimising some parts of the benchmark (in fact, for some things it's better than native gcc with no Clue involved!), but less good at others, and this is dragging the overall figure up. This doesn't necessarily correspond to real world performance. (It's still awesome, though.)

How?

Clue is based on the sparse C compiler frontend. This is plugged into a custom register allocator and code generator, which emits the code.

sparse and Clue are written in gcc-dialect C. It should run on most systems, although it has been developed on Linux, and makes fairly major assumptions about living in a Unix environment --- Windows users will want to use Cygwin and even then you're on your own.

Documentation is provided; currently it's a bit patchy, but reasonably complete. If you have any problems, please join the mailing list.

Why not?

Clue is experimental software. It's sole purpose is to be interesting, and not necessarily useful. The resulting code takes between 10 and 100 times longer to run as it would if you just compiled the program with gcc (and that's when using the Lua backend with LuaJIT, possibly the fastest dynamic language around; any other target will be slower).

In addition, while Clue supports the ANSI standard, most programmers don't; non-ANSI behaviour such as casting a pointer to an integer and vice versa is very common. This will not work. So stock code is unlikely to run on Clue unless the authors have been particularly disciplined. (However, this can also be seen as an advantage: if your code works with gcc and with Clue, it's probably going to work elsewhere.)

And I haven't even mentioned the bugs.

Where?

Clue is hosted on SourceForge.

SourceForge.net Logo

You can get the most recent version of Clue from the project download page.

Note: Right now Clue requires Sparse 0.4.1. Apparently this is pretty hard to come by and some versions vary, which means the patch doesn't work. Try this one; it seems to work for me.

What's new?

Version 0.6, 2013-03-14: Fixed quite a lot of bitrot. Added a new Lua 5.2 target, with goto support. Made work with LuaJIT 2.

Version 0.5, 2008-12-14: Code cleanups was not attending this release; but we do have a shiny new Java backend.

Version 0.4, 2008-12-08: Son of code cleanups (in fact, a pretty major backend overhaul); new Common Lisp and C support.

Version 0.3, 2008-07-19: Code cleanups strike again; new Perl 5 support.

Version 0.2, 2008-07-18: Lots more code cleanups, and a modularised code generator; new Javascript support.

Version 0.1.1, 2008-07-14: The first 'real' release, with lots of code cleanups and optimisations.

Version 0.1pre1, 2008-07-07: The very first release ever.

Who?

Clue was written by David Given. The program is freely distributable under the terms of the two-clause Revised BSD License. The download package contains additional material that is distributable under the terms of the MIT License.

The Common Lisp backend was contributed by Peter Maydell and is also covered by the Revised BSD License.

Sparse was written by the Sparse team and is freely distributable under the terms of the Open Software License v1.1. See the Sparse web site for more information.