Cython 0.20 will make everything better :)

Stefan Behnel

2013-12-21 12:40

It`s been a while since the last major Cython release, so here`s a little update on what`s going to be new in 0.20. We`ve managed to improve the compiler in several important areas.

I`ve been working on making Cython a serious string processing tool for a couple of releases now, and the next release will add another piece to the puzzle. Cython has four string types now: bytes, str, unicode and, newly added, basestring. The special types str and basestring help with writing portable code for both Python 2 and 3. The str type expands to bytes in Py2 and to the Unicode string type in Py3, i.e. to whatever the platform calls "str" natively. This is really nice, because it allows you to get the native string type by simply saying "str" or by writing an unprefixed string literal. However, the problem was that when you type a variable or function argument as str, you couldn't assign a Unicode string to it in Py2, thus unnecessarily rejecting some text input. So I added the basestring type, which accepts both str and unicode in Py2, but only Unicode strings in Py3. Anything else will raise a TypeError on assignment.

Another corner where string processing was improved is the bytearray type. Cython recognises it as a known type now and allows it to coerce from and to C's char* values for much faster interaction with C/C++ code.

Cython`s own internal types, such as the function type or generators, are shared between modules now. Basically, the first imported Cython module in a running interpreter registers a shared (versioned) module that all later Cython modules use for initialisation. This allows multiple installed modules to recognise each others types, which allows for much faster interoperability.

Type inference has also gained new capabilities. It`s based on control flow analysis now (instead of function-global type analysis), which allows for more fine-grained typing. It`s also used in some cases to remember assignments to local constants, which allows Cython to see the current value of that variable and to optimise code based on it. For example, assignments of bound methods (like "append = some_list.append") can be traced back to the underlying object now, which improves the performance of some untyped Python code out there.

As for usability, the compiler has gained a new frontend script "cythonize" that mimics the capabilities of the cythonize() function used in setup.py scripts. To compile a module, or even an entire package down to extension modules, you can simply say "cythonize -i some/package", where "-i" requests to compile and build the extensions "in place", i.e. next to their sources. Oh, and while we`re at it, compiling packages (i.e. "__init__.py" files) actually works in Python 3.3 now.

Extension types behave a bit more smartly at garbage collection time. When an attribute is declared with a known builtin type that cannot produce reference cycles (e.g. strings), then it is automatically excluded from the garbage collection visitor. This reduces the traversal overhead when searching for unreachable objects. It can even exclude a type completely from garbage collection traversal if none of its object attributes requires it. And for the previously tricky case that object attributes had to be accessed at finalisation time, there is a new type decorator "@cython.no_gc_clear" that prevents them from being cleared ahead of time when they participate in reference cycles. Certainly way better than a crash.

And finally, the try-finally statement has been optimised to handle each exit case separately, which allows the C compiler to generate much faster code for the "normal" cases.

Since 0.20 isn`t out yet, this list might still grow a bit, but that`s what is there now. Not bad, if you ask me.