Optimising Cython code
There was a request on the Cython mailing list on how to optimise Cython code. Here's how I do it.
We have some Cython code that we want to benchmark:
x = 1 for i from 0Ok, obviously this is stupid code, as this can be done much easier without a loop. But let's say for the sake of argument that this is the best algorithm that we can come up with, and that we have a suspicion that it might not run as fast as we think it should.
First thing to do is to make that suspicion evidence by benchmarking. So I copy the code over to a Cython module and wrap it in a Python function:
# file: bench.pyx def run(max): x = 1 for i from 0Now we compile the file:
# cython bench.pyx # gcc -shared $CFLAGS -I/usr/include/python2.5 -o bench.so bench.cAnd run it through Python's great timeit module:
# python -m timeit -s 'from bench import run' 'run(100)' 1000000 loops, best of 3: 5.93 usec per loopThis looks exceedingly long-running to me ;)
Since I have no idea what to do better, I first look trough the generated C code. That's not as hard as it sounds, as Cython copies the original Cython code into comments and marks the line that it generates code for. The loop code gets translated into this:
/* ".../TEST/bench.pyx":3 * def run(max): * x = 1 * for i from 0The code I stripped (
/* ... */
) is error handling code. It's emitted in one long line so that it's easy to ignore - which is the best thing to do with it.What you can see here is that Cython is smart enough to optimise the loop into a C loop with a C run variable (type long), but then the unsuspiciously looking operator '+' uses Python API calls, so this is not what I had in mind when I wrote the code. I wanted it to be as fast and straight forward as it looks in Cython. However, Cython cannot know my intention here, as my code might as well depend on the semantics of Python's '+' operator (which is different from the '+' operator in C).
Cython's way of dealing with Python/C type ambiguity is explicit static type declarations through cdefs. By default, all variables are defined as if I had written
cdef object variable
, but in this case, I want them to be plain C integers. So here is the straight forward way to tell Cython that I want the variablesx
andi
to have C semantics rather than Python semantics:# file: bench.pyx def run(max): cdef int i,x x = 1 for i from 0And the resulting C code shows me that Cython understood what I wanted:
/* ".../TEST/bench.pyx":4 * cdef int i,x * x = 1 * for i from 0Now timeit gives me something like this:
# python -m timeit -s 'from bench import run' 'run(100)' 1000000 loops, best of 3: 0.284 usec per loopThat's about a factor of 20 compared to the original example. And that's definitely enough for today.