Line coverage analysis for Cython modules

The coverage analysis tool for Python has recently gained a plugin API that allows external tools to provide source code information. Cython has line tracing support since release 0.19, but the new API allows it to support coverage.py for line coverage reporting.

To test this, you need the latest developer versions of both coverage.py (pre-4.0) and Cython (post-0.22.0):

https://bitbucket.org/ned/coveragepy

https://github.com/cython/cython

Then, enable the Cython plugin in the .coveragerc config file of your project:

[run]
plugins = Cython.Coverage

And compile your Cython modules with line tracing support. This can be done by putting the following two comment lines at the top of the modules that you want to trace:

# cython: linetrace=True
# distutils: define_macros=CYTHON_TRACE=1

That's a double opt-in. The first line instructs Cython to generate verbose line tracing code (and thus increase the size of the resulting C/C++ file), and the second line enables this code at C compile time, which will most likely slow down your program. You can also configure both settings globally in your setup.py script, see the Cython documentation.

Then make sure you build your project in place (python setup.py build_ext --inplace) so that the generated C/C++ code files can be found right next to the binary modules and their sources.

That's it. Now your Cython modules should show up in your coverage reports. Any questions, bug reports or suggestions regarding this new feature can be discussed on the Cython-users mailing list.

Update: Coverage reporting is now also available for code that frees the GIL if the C macro CYTHON_TRACE_NOGIL=1 is set, i.e.:

# cython: linetrace=True
# distutils: define_macros=CYTHON_TRACE_NOGIL=1

All reporting formats of coverage.py are supported: plain text, XML and annotated HTML sources.

What's new in Cython 0.22

Cython 0.22 is nearing completion and it's a major feature release, so here's a list of major improvements to Cython's type system.

C arrays have become first class citizens in Cython's type system. Previously, they behaved mostly just like pointers, as they do in C. The new release allows them to coerce to Python lists (by default) and tuples (when requested by the context). They can also be assigned by value, without the need for an explicit copy loop.

cdef int[3] x = [0, 1, 2]
cdef int[3] y = x
y[2] = 5
print(y)          # -> [0, 1, 5]
print(<tuple>x)   # -> (0, 1, 2)

cdef int[2][3] z = [x, y]
print(z)          # -> [[0, 1, 2], [0, 1, 5]]

Typed tuples were added. They behave like Python tuples and coerce from and to them at need, but support any C types for their arguments.

cdef int x = 1
cdef double y = 2

cdef (int*, double*) xy = (&x, &y)
ptr_a, ptr_b = xy
print(ptr_a[0], ptr_b[0])    # ->  1 2.0

cdef (int, double) ab = (ptr_a[0], ptr_b[0])
print(ab)                    # ->  (1, 2.0)

C/C++ functions have learned to auto-coerce to callable Python objects when used in an object context. Similarly, in order to provide a flat wrapper to an external C function, it is now sufficient to declare it as an external cpdef function in the module that should export it (or in the corresponding .pxd file).

cdef extern from "someheader.h":
    cpdef int cpfunc(int x, double y)
    cdef int cfunc(int x)

def use_cpfunc_from_cython():
    return cpfunc(1, 3.0)

def pass_cfunc_to_python():
    return cfunc

def use_from_python():
    """
    >>> cpfunc(1, 3.0)
    4.0
    >>> cfunc = pass_cfunc_to_python()
    >>> cfunc(4)
    123
    """

lxml christmas funding

My bicycle was recently stolen and since I now have to get a new one, here's a proposal.

From today on until December 24th, I will divert all donations that I receive for my work on lxml to help in restoring my local mobility.

If you do not like this 'misuse', do not donate in this time frame. I do hope, however, that some of you like the idea that the money they give for something they value is used for something that is of value to the receiver.

All the best -- Stefan

Faster Fractions

I spent some time optimising Python's "fractions" module. Fractions (i.e. rational numbers) are great for all sorts of exact computations, especially money calculations. You never have to care about loss of precision, you can freely mix very large and very small numbers any way you like in your computations - the result is always exact as it's all done in integers internally.

But the performance used to suck. Totally. The main problem was the type instantiation, which is really expensive. For example, simply changing this code

f = n * Fraction(x ,y)

to this

f = Fraction(n * x, y)

(which avoids intermediate Fraction operations) could speed it up by factors. I provided some patches that streamline common cases (numerator and denominator will usually be Python ints), and this made the implementation in CPython 3.5 twice as fast as before. It actually starts being usable. :)

For those who can't wait for Python 3.5 to come out (in about a year's time), and also for those who want even better performance (like me), I dropped the implementation into Cython and optimised it further at the C level. That gave me another factor of 5, so the result is currently about 10x faster than what's in the standard library.

Compared to the Decimal type in Python 2.7, it's about 15x faster. The hugely improved C reimplementation of the "decimal" module in Python 3.3 is still about 5-6x faster - or less, if you often need to rescale your values along the way. Plus, with decimal, you always have to take care of using the right precision scale for your code to prevent rounding errors, and playing it safe will slow it down.

I released the module to PyPI, it's called quicktions. Hope you like it.

Cython courses ahead in Leipzig (Germany)

I'm giving an in-depth learn-from-a-core-dev Cython training at the Python Academy in Leipzig (Germany) next month, October 16-17. In two days, the course will cover everything from a Cython intro up to the point where you bring Python code to C speed and use C/C++ libraries and data types in your code.

Read more on it here:

http://www.python-academy.com/courses/specialtopics/python_course_cython.html

What is Cython?

Cython is an optimising static compiler for Python that makes writing C extensions as easy as Python itself. It greatly extends the limits of the Python language and thus has found a large user base in the Scientific Computing community. It also powers various well-known extension modules in the Python package index. Cython is a great way to extend the CPython runtime with native code when performance matters.

Who am I?

I'm Stefan Behnel, a long-time Cython core developer specialising in High-Performance Python trainings and fast code. See my website.

Bound methods in Python

I keep seeing a lot of code around that does this:

import re

_MATCH_DIGITS_RE = re.compile('[0-9]+')

and then uses this regular expression only in one place, e.g. like this:

def func():
    numbers = _MATCH_DIGITS_RE.findall(input)
    ...

Python's re module actually uses expression caching internally, so it's very unlikely that this is any faster in practice than just writing this:

def func():
    numbers = re.findall('[0-9]+', input)
    ...

Which is a shorter and much more straight forward way to write what's going on. Now, for longer and more complex regular expressions, this can actually get out of hand and it does help to give them a readable name. However, all-upper-case constant names tend to be pretty far from readable. So, I always wonder why people don't just write this using a bound method:

_find_numbers = re.compile('[0-9]+').findall

def func():
    numbers = _find_numbers(input)
    ...

I find this much clearer to read. And it nicely abstracts the code that uses the function-like callable _find_numbers() from the underlying implementation, which (in case you really want to know) happens to be a method of a compiled regular expression object.

Faster Python calls in Cython 0.21

I spent some time during the last two weeks reducing the call overhead for Python functions and methods in Cython. It was already quite low compared to CPython before, about 30-40% faster, but profiling then made me stumble over the fact that method calls in CPython really just do one thing: they repack the argument tuple and prepend the 'self' object to it. However, that is done right after Cython has carefully packed up exactly that argument tuple in the first place, so by simply inlining what PyMethodObject does, we can avoid packing tuples twice.

Avoiding to create a PyMethodObject at all may also appear as an interesting goal, but doing that is totally not easy (it happens during attribute lookup) and it's also most likely not worth it as method objects are created from a freelist, which makes their instantiation very fast. Method objects also hold actual state that the caller must receive: the underlying function and the self object. So getting rid of them will severly complicate things without a major gain to expect.

Another obvious optimisation, however, is that Python code calls into C implemented functions quite often, and if those are implemented as specialised functions that take exactly one or no argument (METH_O/METH_NOARGS), then the tuple packing and unpacking can be avoided completely. Together with the method call optimisation, this means that Cython can now call very simple methods without creating an argument tuple, and less simple ones without redundantly creating a second argument tuple.

I implemented these optimisations and they immediately blew up the method call micro benchmarks in Python's benchmark suite from about 1/3 to 2-3 times faster than CPython 3.5 (pre). Those are only simple micro benchmarks, so any real world code will benefit substantially less overall. However, it turned out that a couple of benchmarks in the suite that are based on real production code ended up loosing 5-15% of their total runtime. That's quite remarkable, given that the code they call actually does something (much) more heavy weight than the call overhead itself. I'm still tuning it a bit, but so far am really happy with this result.