Cython for async networking

EuroPython 2016 seems to have three major topics this year, two of which make heavy use of Cython. The first, and probably most wonderful topic is beginners. The conference started with a workshop day on Sunday that was split between Django Girls and (other) Python beginners. The effect on the conference is totally visible: lots of new people walking around, visibly more Python beginners, and a clearly better ratio of women to men.

The other two big topics are: async networking and machine learning. Machine learning fills several talks and tutorials, and is obviously backed by Cython implemented tools in many corners.

For async networking, however, it might seem more surprising that Cython has such a good stand. But there are good reasons for it: even mostly I/O bound applications can hugely benefit from processing speed at the different layers, as Anton and I showed in our talk on Monday (see below). The deeper you step down into the machinery, however, the more important that speed becomes. And Yury Selivanov is giving an excellent example for that with his reimplementation of the asyncio event loop in Cython, named uvloop. Here is a blog post announcing uvloop.

Since the final talk recordings are not online yet, I have to refer to the live stream dumps for now.

The talk by Anton Caceres and me (we're both working at Skoobe) on Fast Async Code with Cython and AsyncIO starts at hour/minute 2:20 in the video. We provide examples and give motivations for compiling async code to speed up the processing and cut down the overall response latency. I'm also giving a very quick "Cython in 10 Minutes" intro to the language about half way through the talk.

Yury's talk on High Performance Networking in Python starts at minute 10. He gives a couple of great testimonials for Cython along the way, describing how the async/await support in Cython and the ease of talking to C libraries has enabled him to write a tool that beats the performance of well known async libraries in Go and Javascript.

What's new in Cython 0.24

Cython 0.24 has been released after about half a year of development. Time for a writeup of the most interesting features and improvements that already anticipate some of the new language features scheduled for CPython 3.6, at the end of this year.

My personal feature favourite is PEP-498 f-strings, e.g.

>>> value = 4 * 20
>>> f'The value is {value}.'
'The value is 80.'
>>> f'Four times the value is {4*value}.'
'Four times the value is 320.'

I was initially opposed to making them a part of the Python language, but then got convinced by others that they do bring an actual improvement by making the .format() string formatting a) the One Way To Do It compared to the other, older ways, and b) actually avoidable by providing a simpler language syntax for it. Especially the indirection of mapping values to format string names is now gone. Compare:

>>> 'The value is {value}.'.format(value=value)
'The value is 80.'

Sadly, we cannot use them in Cython itself (and we do a lot of string formatting in Cython), since the Python code of the compiler still has to run on Python 2.6. But at least code written in Cython can now make use of them, even if the compiled modules need to run in Python 2.6. The only required underlying Python feature is the __format__() protocol, which is available in all Python versions that Cython supports. Now that there is a dedicated syntax for string formatting that the compiler can see and explore, work has started for optimising it further at compile time.

EDIT (2016-04-12): f-string formatting is currently about 2-4 times as fast in Cython as in CPython 3.6.

Another new feature (also from Python 3.6) is the underscore separator in number literals, as defined by PEP 515. Since Cython is used a lot for developing numeric, scientific and algorithmic code, it's certainly helpful to be able to write

n = 1_000_000
x = 3.14159_26535_89

rather than the much less readable "count my digits" forms

n = 1000000
x = 3.141592653589

IDEs like PyCharm and PyDev will still have to catch up with these PEPs, but that will come. It seems that the PyCharm developers have already started working on this.

Some people complained about the size of the C files that Cython generates, so there is now a compiler option to disable the injection of C comments that show the original source code line right before the C code that came out of it. Just start your source file with

# cython: emit_code_comments=False

or, rather, pass the option from your setup.py script. The size reduction can be quite large. There is, however, a drawback: this makes source level debugging more difficult and also prevents coverage reporting. But since these are developer features, omitting the code comments can still be a good choice for release builds.

Pure Python mode (i.e. the optimised compilation of .py files) has also seen some improvements. C-tuples are now supported, i.e. you can write code as in the following (contrieved) example, which provides efficient access to C values of a tuple:

import cython

@cython.locals(t=(cython.int,cython.double), i=cython.int)
def func(t):
    x = 1.0
    for i in range(t[0]):
        x *= t[1]

Type inference will also automatically turn Python tuples into C tuples in some cases, but that already happend in previous releases.

Cython enums were adapted to PEP 435 enums and thus use the standard Python Enum type if available.

And the @property decorator is finally supported for methods in cdef classes. This removes the need for the old special property name: syntax block that Cython inherited from Pyrex. This syntax is now officially deprecated.

Apart from these, there is the usual list of optimisations, bug fixes and general improvements that went into this release. I am happy to notice that the number of people contributing to the code base has been steadily growing over the last couple of years. Thanks to everyone who helps making Cython better with every release!

And for those interested in learning Cython from a pro, there are still places left for the extensive Cython training that I'll be giving in Leipzig (DE) next June, 16-17th. It's part of a larger series of courses throughout that week on High-Performance Computing with Python.

EuroPython 2016

The list of accepted talks for EuroPython was published with lots of great talks and topics. I got my tutorial on Cython with NumPy accepted, as well as my talk on Fast Async Code with Cython and AsyncIO that I'm giving together with a collegue from Skoobe. I guess both proposals colided too heavily with the current list of hype buzzwords to get rejected. ;-)

Thanks to everyone who voted for them! Hope to see you there!

And in case you're interested in learning Cython from a pro, there are still places left for the extensive Cython training that I'll be giving in Leipzig (DE) next June, 16-17th. It's part of a larger series of courses throughout that week on High-Performance Computing with Python.

What's new in Cython 0.23

It was really nice to hear Cython being mentioned in pretty much every second talk at this year's EuroSciPy in Cambridge, either directly or by promoting tools that were themselves written in Cython. It seems to have become a key cornerstone in Python's ecosystem.

For me, this has been an intense conference summer overall, so I only just noticed that I hadn't written up anything about the long list of changes in Cython 0.23 yet. Well, apart from the changelog :). So here it goes.

Cython 0.23 is a major feature release. It implements the new language features of the new Python 3.5, but with the usual backports to make them available in Python 2.6 and later. My personal feature favourites are PEP 492 (async/await) and inlined generator expressions, but also the new support for Cython code coverage analysis with the coverage.py tool, that I blogged about already. Note that some Py3.5 features, such as the PEP 465 matrix multiplication operator, were already released early this year in Cython 0.22.

One of the greatest changes in Python 3.5, from a Cython perspective, is the shift from generators and coroutines as builtin types to making them ABCs, i.e. abstract base classes in the collections.abc module. This enables full interoperability of Cython's own implementations with Python's builtin types. To make this available in older Python versions, I created the backports_abc package for applications to use. Basically, you would install it and then say

import backports_abc
backports_abc.patch()

to get the Coroutine and Generator classes added to the collections.abc module if they are missing (i.e. in CPython versions before 3.5). This allows Cython to register its internal types with them and from that point on, any code using isinstance(x, Generator) or isinstance(x, Coroutine) will be able to support Cython's fast builtin types. If you find software packages that do not support this yet because they do type checks instead of ABC instance checks, please file a bug report with their projects to get them adapted. Twisted comes to mind, for example, or Tornado. [Update 2015-09-15]: The next Tornado release will have support for Cython coroutines.

CPython 3.5 itself has already been fixed up, thanks to Yury Selivanov, to natively support Cython implemented generators and coroutines. This allows asynchronous code (currently, but not limited to, asyncio) to take advantage of Cython's speed, which can easily be several times faster than Python coroutines in practice.

What else is there in Cython 0.23? Well, inlined generator expressions are back. This means that when you call builtin functions like any() or sorted() on a generator expression, Cython will inline its own (partial) implementation of that function into the generator implementation and avoid calling back and forth between the generator and its consumer. Instead, it will calculate the final result inside of the generator and only return it in one step at the very end. This can speed up the overall execution considerably.

Another feature that I invested some time into is faster numeric operations on Python's long (Py3 int) and float types. Arithmetic or shifting operations that involves integer or float constants are now faster because they avoid double unpacking of the operands and inline their constant parts.

On the C++ integration front, several minor issues have seen improvements and new standard declarations were added, which makes Cython 0.23 an even nicer and rounder tool for wrapping C++ code bases, or for making use of STL containers in Cython code. More C++ improvements are planned for the upcoming 0.24 release.

As usual, the complete list of features and fixes is much longer for such a large release. Hope you like it.

Line coverage analysis for Cython modules

The coverage analysis tool for Python has recently gained a plugin API that allows external tools to provide source code information. Cython has line tracing support since release 0.19, but the new API allows it to support coverage.py for line coverage reporting.

To test this, you need the latest developer versions of both coverage.py (pre-4.0) and Cython (post-0.22.0):

https://bitbucket.org/ned/coveragepy

https://github.com/cython/cython

Then, enable the Cython plugin in the .coveragerc config file of your project:

[run]
plugins = Cython.Coverage

And compile your Cython modules with line tracing support. This can be done by putting the following two comment lines at the top of the modules that you want to trace:

# cython: linetrace=True
# distutils: define_macros=CYTHON_TRACE=1

That's a double opt-in. The first line instructs Cython to generate verbose line tracing code (and thus increase the size of the resulting C/C++ file), and the second line enables this code at C compile time, which will most likely slow down your program. You can also configure both settings globally in your setup.py script, see the Cython documentation.

Then make sure you build your project in place (python setup.py build_ext --inplace) so that the generated C/C++ code files can be found right next to the binary modules and their sources.

That's it. Now your Cython modules should show up in your coverage reports. Any questions, bug reports or suggestions regarding this new feature can be discussed on the Cython-users mailing list.

Update: Coverage reporting is now also available for code that frees the GIL if the C macro CYTHON_TRACE_NOGIL=1 is set, i.e.:

# cython: linetrace=True
# distutils: define_macros=CYTHON_TRACE_NOGIL=1

All reporting formats of coverage.py are supported: plain text, XML and annotated HTML sources.

What's new in Cython 0.22

Cython 0.22 is nearing completion and it's a major feature release, so here's a list of major improvements to Cython's type system.

C arrays have become first class citizens in Cython's type system. Previously, they behaved mostly just like pointers, as they do in C. The new release allows them to coerce to Python lists (by default) and tuples (when requested by the context). They can also be assigned by value, without the need for an explicit copy loop.

cdef int[3] x = [0, 1, 2]
cdef int[3] y = x
y[2] = 5
print(y)          # -> [0, 1, 5]
print(<tuple>x)   # -> (0, 1, 2)

cdef int[2][3] z = [x, y]
print(z)          # -> [[0, 1, 2], [0, 1, 5]]

Typed tuples were added. They behave like Python tuples and coerce from and to them at need, but support any C types for their arguments.

cdef int x = 1
cdef double y = 2

cdef (int*, double*) xy = (&x, &y)
ptr_a, ptr_b = xy
print(ptr_a[0], ptr_b[0])    # ->  1 2.0

cdef (int, double) ab = (ptr_a[0], ptr_b[0])
print(ab)                    # ->  (1, 2.0)

C/C++ functions have learned to auto-coerce to callable Python objects when used in an object context. Similarly, in order to provide a flat wrapper to an external C function, it is now sufficient to declare it as an external cpdef function in the module that should export it (or in the corresponding .pxd file).

cdef extern from "someheader.h":
    cpdef int cpfunc(int x, double y)
    cdef int cfunc(int x)

def use_cpfunc_from_cython():
    return cpfunc(1, 3.0)

def pass_cfunc_to_python():
    return cfunc

def use_from_python():
    """
    >>> cpfunc(1, 3.0)
    4.0
    >>> cfunc = pass_cfunc_to_python()
    >>> cfunc(4)
    123
    """

lxml christmas funding

My bicycle was recently stolen and since I now have to get a new one, here's a proposal.

From today on until December 24th, I will divert all donations that I receive for my work on lxml to help in restoring my local mobility.

If you do not like this 'misuse', do not donate in this time frame. I do hope, however, that some of you like the idea that the money they give for something they value is used for something that is of value to the receiver.

All the best -- Stefan

Faster Fractions

I spent some time optimising Python's "fractions" module. Fractions (i.e. rational numbers) are great for all sorts of exact computations, especially money calculations. You never have to care about loss of precision, you can freely mix very large and very small numbers any way you like in your computations - the result is always exact as it's all done in integers internally.

But the performance used to suck. Totally. The main problem was the type instantiation, which is really expensive. For example, simply changing this code

f = n * Fraction(x ,y)

to this

f = Fraction(n * x, y)

(which avoids intermediate Fraction operations) could speed it up by factors. I provided some patches that streamline common cases (numerator and denominator will usually be Python ints), and this made the implementation in CPython 3.5 twice as fast as before. It actually starts being usable. :)

For those who can't wait for Python 3.5 to come out (in about a year's time), and also for those who want even better performance (like me), I dropped the implementation into Cython and optimised it further at the C level. That gave me another factor of 5, so the result is currently about 10x faster than what's in the standard library.

Compared to the Decimal type in Python 2.7, it's about 15x faster. The hugely improved C reimplementation of the "decimal" module in Python 3.3 is still about 5-6x faster - or less, if you often need to rescale your values along the way. Plus, with decimal, you always have to take care of using the right precision scale for your code to prevent rounding errors, and playing it safe will slow it down.

I released the module to PyPI, it's called quicktions. Hope you like it.

Cython courses ahead in Leipzig (Germany)

I'm giving an in-depth learn-from-a-core-dev Cython training at the Python Academy in Leipzig (Germany) next month, October 16-17. In two days, the course will cover everything from a Cython intro up to the point where you bring Python code to C speed and use C/C++ libraries and data types in your code.

Read more on it here:

http://www.python-academy.com/courses/specialtopics/python_course_cython.html

What is Cython?

Cython is an optimising static compiler for Python that makes writing C extensions as easy as Python itself. It greatly extends the limits of the Python language and thus has found a large user base in the Scientific Computing community. It also powers various well-known extension modules in the Python package index. Cython is a great way to extend the CPython runtime with native code when performance matters.

Who am I?

I'm Stefan Behnel, a long-time Cython core developer specialising in High-Performance Python trainings and fast code. See my website.