Unicode support in Cython 0.13

Stefan Behnel

2010-05-17 12:56

I'm really happy about all the new Unicode and type inference features in the upcoming Cython 0.13. It finally has support for CPython's Unicode code point type, Py_UNICODE, and can transform various operations on Unicode characters into plain C code. For example, this will run as a plain C integer for loop:

def count_lower_case_characters(unicode ustring):
    cdef Py_ssize_t count = 0
    for uchar in ustring:
         if uchar.islower():
             count += 1
    return count

Or, if you only want to know if there are any lower case characters in a string at all, here's another plain C solution:


def any_lower_case_characters(unicode ustring):
    return any(uchar.islower() for uchar in ustring)

The latter is actually somewhat of a fake as Cython does not generally support generator expressions yet. However, it still shows where the language is going. In the examples above, Cython can infer that the loop variable can only ever hold a single Unicode character, so it can safely map the entire loop into C space. IMHO, a beautiful example that the integration of Python objects and C types is getting so tight that even totally innocent looking code starts running at the speed of light.