Apples, oranges and tomatoes

People keep asking how Cython and PyPy compare performance-wise, and which they should choose. This is my answer.

To ask which is faster, CPython, PyPy or Cython, outside of a very well defined and specific context of existing code and requirements, is basically comparing apples, oranges and tomatoes. Any of the three can win against the others for the right kind of applications (apple sauce on your pasta, anyone?). Here's a rule-of-thumb kind of comparison that may be way off for a given piece of code but should give you a general idea.

Note that we're only talking about CPU-bound code here. I/O-bound code will only show a difference in some very well selected cases (e.g. because Cython allows you to step down into low-level minimum-copy I/O using C, in which case it may not really have been I/O bound before).

PyPy is very fast for pure Python code that generally runs in loops for a while and makes heavy use of Python objects. It's great for computational code (and often way faster than CPython for it) but has its limits for numerics, huge data sets and other seriously performance critical code because it doesn't really allow you to fine-tune your code. Like any JIT compiler, it's a black box where you put something in and either you like the result or not. That equally applies to the integration with native code through the ctypes library, where you can be very lucky, or not. Although the platform situation keeps improving, the PyPy platform still lacks a wide range of external libraries that are available for the CPython platform, including many tools that people use to speed up their Python code.

CPython is usually quite a bit faster than PyPy for one-shot scripts (especially when including the startup time) and more generally for code that doesn't benefit from long-running loops. For example, I was surprised to see how much slower it is to run something as large as the Cython compiler inside of PyPy to compile code, despite being written in pure Python code. CPython is also very portable and extensible (especially using Cython) and has a much larger set of external (native) libraries available than the PyPy platform, including all of NumPy and SciPy, for example. However, its performance looses against PyPy for most pure Python applications that keep doing the same stuff for a while, without resorting to native code or optimised native libraries for the heavy lifting.

Cython is very fast for low-level computations, for (thread-)parallel code and for code that benefits from switching seamlessly between C/C++ and Python. The main feature is that it allows for very fine grained manual code tuning from pure Python to C-ish Python to C to external libraries. It is designed to extend a Python runtime, not to replace it. When used to extend CPython, it obviously inherits all advantages of that platform in terms of available code. It's usually way slower than PyPy for the kind of object-heavy pure Python code in which PyPy excels, including some kinds of computational code, even if you start optimising the code manually. Compared to CPython, however, Cython compiled pure Python code usually runs faster and it's easy to make it run much faster.

So, for an existing (mostly) pure Python application, PyPy is generally worth a try. It's usually faster than CPython and often fast enough all by itself. If it's not, well, then it's not and you can go and file a bug report with them. Or just drop it and happily ignore that it exists from that point on. Or just ignore it entirely in the first place, because your application runs fast enough anyway, so why change anything about it?

However, for most other, non-trivial applications, the simplistic question "which platform is faster" is much less important in real life. If an application has (existing or anticipated) non-trivial external dependencies that are not available or do not work reliably in a given platform, then the choice is obvious. And if you want to (or have to) optimise and tune the code yourself (where it makes sense to do that), the combination of CPython and Cython is often more rewarding, but requires more manual work than a quick test run in PyPy. For cases where most of the heavy lifting is being done in some C, C++, Fortran or other low-level library, either platform will do, often with a "there's already a binding for it" advantage for CPython and otherwise a usability and tunability advantage for Cython when the binding needs to be written from scratch. Apples, oranges and tomatoes, if you only ask which is faster.

Another thing to consider is that CPython and PyPy can happily communicate with each other from separate processes. So, there are ways to let applications benefit from both platforms at the same time when the need arises. Even heterogeneous MPI setups might be possible.

There is also work going on to improve the new integration of Cython with PyPy, which allows to compile and run Cython code on the PyPy platform. The performance of that interface currently suffers from the lack of optimisation in PyPy's cpyext emulation layer, but that should get better over time. The main point for now is that the integration lifts the platform lock-in for both sides, which makes more native code available for both platforms.