What's new in Cython 0.29?

I'm happy to announce the release of Cython 0.29. In case you didn't hear about Cython before, it's the most widely used statically optimising Python compiler out there. It translates Python (2/3) code to C, and makes it as easy as Python itself to tune the code all the way down into fast native code. This time, we added several new features that help with speeding up and parallelising regular Python code to escape from the limitations of the GIL.

So, what exactly makes this another great Cython release?

The contributors

First of all, our contributors. A substantial part of the changes in this release was written by users and non-core developers and contributed via pull requests. A big "Thank You!" to all of our contributors and bug reporters! You really made this a great release.

Above all, Gabriel de Marmiesse has invested a remarkable amount of time into restructuring and rewriting the documentation. It now has a lot less historic smell, and much better, tested (!) code examples. And he obviously found more than one problematic piece of code in the docs that we were able to fix along the way.

Cython 3.0

And this will be the last 0.x release of Cython. The Cython compiler has been in production critical use for years, all over the world, and there is really no good reason for it to have an 0.x version scheme. In fact, the 0.x release series can easily be counted as 1.x, which is one of the reasons why we now decided to skip the 1.x series all together. And, while we're at it, why not the 2.x prefix as well. Shift the decimals of 0.29 a bit to the left, and then the next release will be 3.0. The main reason for that is that we want 3.0 to do two things: a) switch the default language compatibility level from Python 2.x to 3.x and b) break with some backwards compatibility issues that get more in the way than they help. We have started collecting a list of things to rethink and change in our bug tracker.

Turning the language level switch is a tiny code change for us, but a larger change for our users and the millions of source lines in their code bases. In order to avoid any resemblance with the years of effort that went into the Py2/3 switch, we took measures that allow users to choose how much effort they want to invest, from "almost none at all" to "as much as they want".

Cython has a long tradition of helping users adapt their code for both Python 2 and Python 3, ever since we ported it to Python 3.0. We used to joke back in 2008 that Cython was the easiest way to migrate an existing Py2 code base to Python 3, and it was never really meant as a joke. Many annoying details are handled internally in the compiler, such as the range versus xrange renaming, or dict iteration. Cython has supported dict and set comprehensions before they were backported to Py2.7, and has long provided three string types (or four, if you want) instead of two. It distinguishes between bytes, str and unicode (and it knows basestring), where str is the type that changes between Py2's bytes str and Py3's Unicode str. This distinction helps users to be explicit, even at the C level, what kind of character or byte sequence they want, and how it should behave across the Py2/3 boundary.

For Cython 3.0, we plan to switch only the default language level, which users can always change via a command line option or the compiler directive language_level. To be clear, Cython will continue to support the existing language semantics. They will just no longer be the default, and users have to select them explicitly by setting language_level=2. That's the "almost none at all" case. In order to prepare the switch to Python 3 language semantics by default, Cython now issues a warning when no language level is explicitly requested, and thus pushes users into being explicit about what semantics their code requires. We obviously hope that many of our users will take the opportunity and migrate their code to the nicer Python 3 semantics, which Cython has long supported as language_level=3.

But we added something even better, so let's see what the current release has to offer.

A new language-level

Cython 0.29 supports a new setting for the language_level directive, language_level=3str, which will become the new default language level in Cython 3.0. We already added it now, so that users can opt in and benefit from it right away, and already prepare their code for the coming change. It's an "in between" kind of setting, which enables all the nice Python 3 goodies that are not syntax compatible with Python 2.x, but without requiring all unprefixed string literals to become Unicode strings when the compiled code runs in Python 2.x. This was one of the biggest problems in the general Py3 migration. And in the context of Cython's integration with C code, it got in the way of our users even a bit more than it would in Python code. Our goals are to make it easy for new users who come from Python 3 to compile their code with Cython and to allow existing (Cython/Python 2) code bases to make use of the benefits before they can make a 100% switch.

Module initialisation like Python does

One great change under the hood is that we managed to enable the PEP-489 support (again). It was already mostly available in Cython 0.27, but lead to problems that made us back-pedal at the time. Now we believe that we found a way to bring the saner module initialisation of Python 3.5 to our users, without risking the previous breakage. Most importantly, features like subinterpreter support or module reloading are detected and disabled, so that Cython compiled extension modules cannot be mistreated in such environments. Actual support for these little used features will probably come at some point, but will certainly require an opt-in of the users, since it is expected to reduce the overall performance of Python operations quite visibly. The more important features like a correct __file__ path being available at import time, and in fact, extension modules looking and behaving exactly like Python modules during the import, are much more helpful to most users.

Compiling plain Python code with OpenMP and memory views

Another PEP is worth mentioning next, actually two PEPs: 484 and 526, vulgo type annotations. Cython has supported type declarations in Python code for years, has switched to PEP-484/526 compatible typing with release 0.27 (more than one year ago), and has now gained several new features that make static typing in Python code much more widely usable. Users can now declare their statically typed Python functions as not requiring the GIL, and thus call them from a parallel OpenMP loops and parallel Python threads, all without leaving Python code compatibility. Even exceptions can now be raised directly from thread-parallel code, without first having to acquire the GIL explicitly.

And memory views are available in Python typing notation:

import cython
from cython.parallel import prange

@cython.cfunc
@cython.nogil
def compute_one_row(row: cython.double[:]) -> cython.int:
    ...

def process_2d_array(data: cython.double[:,:]):
    i: cython.Py_ssize_t

    for i in prange(data.shape[0], num_threads=16, nogil=True):
        compute_one_row(data[i])

This code will work with NumPy arrays when run in Python, and with any data provider that supports the Python buffer interface when compiled with Cython. As a compiled extension module, it will execute at full C speed, in parallel, with 16 OpenMP threads, as requested by the prange() loop. As a normal Python module, it will support all the great Python tools for code analysis, test coverage reporting, debugging, and what not. Although Cython also has direct support for a couple of those by now. Profiling (with cProfile) and coverage analysis (with coverage.py) have been around for several releases, for example. But debugging a Python module in the interpreter is obviously still much easier than debugging a native extension module, with all the edit-compile-run cycle overhead.

Cython's support for compiling pure Python code combines the best of both worlds: native C speed, and easy Python code development, with full support for all the great Python 3.7 language features, even if you still need your (compiled) code to run in Python 2.7.

More speed

Several improvements make use of the dict versioning that was introduced in CPython 3.6. It allows module global names to be looked up much faster, close to the speed of static C globals. Also, the attribute lookup for calls to cpdef methods (C methods with Python wrappers) can benefit a lot, it can become up to 4x faster.

Constant tuples and slices are now deduplicated and only created once at module init time. Especially with common slices like [1:] or [::-1], this can reduce the amount of one-time initialiation code in the generated extension modules.

The changelog lists several other optimisations and improvements.

Many important bug fixes

We've had a hard time following a change in CPython 3.7 that "broke the world", as Mark Shannon put it. It was meant as a mostly internal change on their side that improved the handling of exceptions inside of generators, but it turned out to break all extension modules out there that were built with Cython, and then some. A minimal fix was already released in Cython 0.28.4, but 0.29 brings complete support for the new generator exception stack in CPython 3.7, which allows exceptions raised or handled by Cython implemented generators to interact correctly with CPython's own generators. Upgrading is therefore warmly recommended for better CPython 3.7 support. As usual with Cython, translating your existing code with the new release will make it benefit from the new features, improvements and fixes.

Stackless Python has not been a big focus for Cython development so far, but the developers noticed a problem with Cython modules earlier this year. Normally, they try to keep Stackless binary compatible with CPython, but there are corner cases where this is not possible (specifically frames), and one of these broke the compatibility with Cython compiled modules. Cython 0.29 now contains a fix that makes it play nicely with Stackless 3.x.

A funny bug that is worth noting is a mysteriously disappearing string multiplier in earlier Cython versions. A constant expression like "x" * 5 results in the string "xxxxx", but "x" * 5 + "y" becomes "xy". Apparently not a common code construct, since no user ever complained about it.

Long-time users of Cython and NumPy will be happy to hear that Cython's memory views are now API-1.7 clean, which means that they can get rid of the annoying "Using deprecated NumPy API" warnings in the C compiler output. Simply append the C macro definition ("NPY_NO_DEPRECATED_API", "NPY_1_7_API_VERSION") to the macro setup of your distutils extensions in setup.py to make them disappear. Note that this does not apply to the old low-level ndarray[...] syntax, which exposes several deprecated internals of the NumPy C-API that are not easy to replace. Memory views are a fast high-level abstraction that does not rely specifically on NumPy and therefore does not suffer from these API constraints.

Less compilation :)

And finally, as if to make a point that static compilation is a great tool but not always a good idea, we decided to reduce the number of modules that Cython compiles of itself from 13 down to 8, thus keeping 5 more modules normally interpreted by Python. This makes the compiler runs about 5-7% slower, but reduces the packaged size and the installed binary size by about half, thus reducing download times in CI builds and virtualenv creations. Python is a very efficient language when it comes to functionality per line of code, and its byte code is similarly high-level and efficient. Compiled native code is a lot larger and more verbose in comparison, and this can easily make a difference of megabytes of shared libraries versus kilobytes of Python modules.

We therefore repeat our recommendation to focus Cython's usage on the major pain points in your application, on the critical code sections that a profiler has pointed you at. The ability to compile those, and to tune them at the C level, is what makes Cython such a great and versatile tool.