Archive for Mai, 2010

Inline generator expressions with Cython

Donnerstag, Mai 27th, 2010

Besides any() and all(), Cython also has support for inline generator expressions in sum() now - and it’’s fast!

# CPython 2.6.5:
timeit "sum(i for i in xrange(10000))"
1000 loops, best of 3: 584 usec per loop
# Cython 0.13-pre:
def range_sum_typed(int N):
    cdef int result = sum(i for i in range(N))
    return result

timeit "range_sum_typed(10000)"
100000 loops, best of 3: 2 usec per loop

Obviously a really stupid benchmark, but still - that’’s what I call a difference!

Here’’s another benchmark for looping over a fixed list of Python integers instead:

# CPython 2.6.5:
timeit -s "r=range(10000)"  "sum((i*i for i in r), 100)"
1000 loops, best of 3: 868 usec per loop
# Cython 0.13-pre:
def typed_sum_squares(seq, int start):
    cdef long i, result
    result = sum((i*i for i in seq), start)
    return result

timeit -s "r=range(10000)"  "typed_sum_squares(r, 100)"
10000 loops, best of 3: 49.9 usec per loop

Not exactly what I”d call “closer”. ;)

Unicode support in Cython 0.13

Montag, Mai 17th, 2010

I’m really happy about all the new Unicode and type inference features in the upcoming Cython 0.13. It finally has support for CPython’s Unicode code point type, Py_UNICODE, and can transform various operations on Unicode characters into plain C code. For example, this will run as a plain C integer for loop:

def count_lower_case_characters(unicode ustring):
    cdef Py_ssize_t count = 0
    for uchar in ustring:
         if uchar.islower():
             count += 1
    return count

Or, if you only want to know if there are any lower case characters in a string at all, here’s another plain C solution:

def any_lower_case_characters(unicode ustring):
    return any(uchar.islower() for uchar in ustring)

The latter is actually somewhat of a fake as Cython does not generally support generator expressions yet. However, it still shows where the language is going. In the examples above, Cython can infer that the loop variable can only ever hold a single Unicode character, so it can safely map the entire loop into C space. IMHO, a beautiful example that the integration of Python objects and C types is getting so tight that even totally innocent looking code starts running at the speed of light.