Cython for async networking

EuroPython 2016 seems to have three major topics this year, two of which make heavy use of Cython. The first, and probably most wonderful topic is beginners. The conference started with a workshop day on Sunday that was split between Django Girls and (other) Python beginners. The effect on the conference is totally visible: lots of new people walking around, visibly more Python beginners, and a clearly better ratio of women to men.

The other two big topics are: async networking and machine learning. Machine learning fills several talks and tutorials, and is obviously backed by Cython implemented tools in many corners.

For async networking, however, it might seem more surprising that Cython has such a good stand. But there are good reasons for it: even mostly I/O bound applications can hugely benefit from processing speed at the different layers, as Anton and I showed in our talk on Monday (see below). The deeper you step down into the machinery, however, the more important that speed becomes. And Yury Selivanov is giving an excellent example for that with his reimplementation of the asyncio event loop in Cython, named uvloop. Here is a blog post announcing uvloop.

Since the final talk recordings are not online yet, I have to refer to the live stream dumps for now.

The talk by Anton Caceres and me (we're both working at Skoobe) on Fast Async Code with Cython and AsyncIO starts at hour/minute 2:20 in the video. We provide examples and give motivations for compiling async code to speed up the processing and cut down the overall response latency. I'm also giving a very quick "Cython in 10 Minutes" intro to the language about half way through the talk.

Yury's talk on High Performance Networking in Python starts at minute 10. He gives a couple of great testimonials for Cython along the way, describing how the async/await support in Cython and the ease of talking to C libraries has enabled him to write a tool that beats the performance of well known async libraries in Go and Javascript.

What's new in Cython 0.24

Cython 0.24 has been released after about half a year of development. Time for a writeup of the most interesting features and improvements that already anticipate some of the new language features scheduled for CPython 3.6, at the end of this year.

My personal feature favourite is PEP-498 f-strings, e.g.

>>> value = 4 * 20
>>> f'The value is {value}.'
'The value is 80.'
>>> f'Four times the value is {4*value}.'
'Four times the value is 320.'

I was initially opposed to making them a part of the Python language, but then got convinced by others that they do bring an actual improvement by making the .format() string formatting a) the One Way To Do It compared to the other, older ways, and b) actually avoidable by providing a simpler language syntax for it. Especially the indirection of mapping values to format string names is now gone. Compare:

>>> 'The value is {value}.'.format(value=value)
'The value is 80.'

Sadly, we cannot use them in Cython itself (and we do a lot of string formatting in Cython), since the Python code of the compiler still has to run on Python 2.6. But at least code written in Cython can now make use of them, even if the compiled modules need to run in Python 2.6. The only required underlying Python feature is the __format__() protocol, which is available in all Python versions that Cython supports. Now that there is a dedicated syntax for string formatting that the compiler can see and explore, work has started for optimising it further at compile time.

EDIT (2016-04-12): f-string formatting is currently about 2-4 times as fast in Cython as in CPython 3.6.

Another new feature (also from Python 3.6) is the underscore separator in number literals, as defined by PEP 515. Since Cython is used a lot for developing numeric, scientific and algorithmic code, it's certainly helpful to be able to write

n = 1_000_000
x = 3.14159_26535_89

rather than the much less readable "count my digits" forms

n = 1000000
x = 3.141592653589

IDEs like PyCharm and PyDev will still have to catch up with these PEPs, but that will come. It seems that the PyCharm developers have already started working on this.

Some people complained about the size of the C files that Cython generates, so there is now a compiler option to disable the injection of C comments that show the original source code line right before the C code that came out of it. Just start your source file with

# cython: emit_code_comments=False

or, rather, pass the option from your script. The size reduction can be quite large. There is, however, a drawback: this makes source level debugging more difficult and also prevents coverage reporting. But since these are developer features, omitting the code comments can still be a good choice for release builds.

Pure Python mode (i.e. the optimised compilation of .py files) has also seen some improvements. C-tuples are now supported, i.e. you can write code as in the following (contrieved) example, which provides efficient access to C values of a tuple:

import cython

def func(t):
    x = 1.0
    for i in range(t[0]):
        x *= t[1]

Type inference will also automatically turn Python tuples into C tuples in some cases, but that already happend in previous releases.

Cython enums were adapted to PEP 435 enums and thus use the standard Python Enum type if available.

And the @property decorator is finally supported for methods in cdef classes. This removes the need for the old special property name: syntax block that Cython inherited from Pyrex. This syntax is now officially deprecated.

Apart from these, there is the usual list of optimisations, bug fixes and general improvements that went into this release. I am happy to notice that the number of people contributing to the code base has been steadily growing over the last couple of years. Thanks to everyone who helps making Cython better with every release!

And for those interested in learning Cython from a pro, there are still places left for the extensive Cython training that I'll be giving in Leipzig (DE) next June, 16-17th. It's part of a larger series of courses throughout that week on High-Performance Computing with Python.