A static Python compiler? What's the point?

Stefan Behnel

2012-11-11 18:01

I've finally found the time to look through the talks of this year's EuroPython (which I didn't attend - I mean, Firenze? In plain summer? Seriously?). That made me stumble over a rather lengthy talk by Kay Hayen about his Nuitka compiler project. It took more than an hour, almost one and a half. I had to skip ahead through the video more than once. Certainly reminded me that it's a good idea to keep my own talks short.

Apparently, there was a mixed reception of that talk. Some people seemed to be heavily impressed, others didn't like it at all. According to the comments, Guido was more in the latter camp. I can understand that. The way Kay presented his project was not very convincing. The only "excuse" he had for its existence was basically "I do it in my spare time" and "I don't like the alternatives". In the stream of details that he presented, he completely failed to make the case for a static Python compiler at all. And Guido's little remark in his keynote that "some people still try to do this" showed once again that this case must still be made.

So, what's the problem with static Python compilers, compared to static compilers for other languages? Python can obviously be translated into static code, the mere fact that it can be interpreted shows that. Simply chaining all code that the interpreter executes will yield a static code representation. However, that doesn't answer the question whether it's worth doing. The interpreter in CPython is a much more compact piece of code than the result of such a translation would be, and it's also much simpler. The trace pruning that HotPy does, according to another talk at the same conference, is a very good example for the complexity involved. The fact that ShedSkin and PyPy's RPython do explicitly not try to implement the whole Python language speaks volumes. And the overhead of an additional compilation step is actually something that drives many people to use the Python interpreter in the first place. Static compilation is not a virtue. Thus, I would expect an excuse for writing a static translator from anyone who attempts it. The normal excuse that people bring forward is "because it's faster". Faster than interpretation.

Now, Python is a dynamic language, which makes static translation difficult already, but it's a dynamic language where side-effects are the normal case rather than an exception. That means that static analysis and optimisation can never be as effective as runtime analysis and optimisation, not with a resonable effort. At least WPA (whole program analysis) would be required in order to make static optimisations as effective as runtime optimisations, but both ShedSkin and RPython make it clear that this can only be done for a limited subset of the language. And it obviously requires the whole program to be available at compile time, which is usually not the case, if only due to the excessive resource requirements of a WPA. PyPy is a great example, compiling its RPython sources takes tons of memory and a ridiculous amount of time.

That's why I don't think that "because it's faster" catches it, not as plain as that. The case for a static compiler must be that "it solves a problem". Cython does that. People don't use Cython because it has such a great Python code optimiser. Plain, unmodified Python code compiled by Cython, while usually faster than interpretation in CPython, will often be slower and sometimes several times slower than what PyPy's JIT driven optimiser gets out of it. No, people use Cython because it helps them solve a problem. Which is either that they want to connect to external non-Python libraries from Python code or that they want to be able to manually optimise their code, or both. It's manual code optimisation and tuning where static compilers are great. Runtime optimisers can't give you that and interpreters obviously won't give you that either. The whole selling point of Cython is not that it will make Python code magically run fast all by itself, but that it allows users to tremendously expand the range of manual optimisations that they can apply to their Python code, up to the point where it's no longer Python code but essentially C code in a Python-like syntax, or even plain C code that they interface with as if it was Python code. And this works completely seamlessly, without building new language barriers along the way.

So, the point is not that Cython is a static Python compiler, the point is that it is more than a Python compiler. It solves a problem in addition to just being a compiler. People have been trying to write static compilers for Python over and over again, but all of them fail to provide that additional feature that can make them useful to a broad audience. I don't mind them doing that, having fun writing code is a perfectly valid reason to do it. But they shouldn't expect others to start raving about the result, unless they can provide more than just static compilation.