A really fast Python web server with Cython

Shortly after I wrote about speeding up Python web frameworks with Cython, Nexedi posted an article about their attempt to build a fast multicore web server for Python that can compete with the performance of compiled coroutines in the Go language.

Their goal is to use Cython to build a web framework around a fast native web server, and to use Cython's concurrency and coroutine support to gain native performance also in the application code, without sacrificing the readability that Python provides.

Their experiments look very promising so far. They managed to process 10K requests per second concurrently, which actually do real processing work. That is worth noting, because many web server benchmarks out there content themselves with the blank response time for a "hello world", thus ignoring any concurrency overhead etc. For that simple static "Hello world!", they even got 400K requests per second, which shows that this is not a very realistic benchmark. Under load, their system seems to scale pretty linearly with the number of threads, also not a given among web frameworks.

I might personally get involved in further improving Cython for this kind of concurrent, async applications. Stay tuned.

Cython for web frameworks

I'm excited to see the Python web community pick up Cython more and more to speed up their web frameworks.

uvloop as a fast drop-in replacement for asyncio has been around for a while now, and it's mostly written in Cython as a wrapper around libuv. The Falcon web framework optionally compiles itself with Cython, while keeping up support for PyPy as a plain Python package. New projects like Vibora show that it pays off to design a framework for both (Flask-like) simplicity and (native) speed from the ground up to leverage Cython for the critical parts. Quote of the day:

"300.000 req/sec is a number comparable to Go's built-in web server (I'm saying this based on a rough test I made some years ago). Given that Go is designed to do exactly that, this is really impressive. My kudos to your choice to use Cython." – Reddit user 'beertown'.

Alex Orlov gave a talk at the PyCon-US in 2017 about using Cython for more efficient code, in which he mentioned the possibility to speed up the Django URL dispatcher by 3x, simply by compiling the module as it is.

Especially in async frameworks, minimising the time spent in processing (i.e. outside of the I/O-Loop) is critical for the overall responsiveness and performance. Anton Caceres and I presented fast async code with Cython at EuroPython 2016, showing how to speed up async coroutines by compiling and optimising them.

In order to minimise the processing time on the server, many template engines use native accelerators in one way or another, and writing those in Cython (instead of C/C++) is a huge boost in terms of maintenance (and probably also speed). But several engines also generate Python code from a templating language, and those templates tend to be way more static than not (they are rarely runtime generated themselves). Therefore, compiling the generated template code, or even better, directly targeting Cython with the code generation instead of just plain Python has the potential to speed up the template processing a lot. For example, Cython has very fast support for PEP-498 f-strings and even transforms some '%'-formatting patterns into them to speed them up (also in code that requires backwards compatibility with older Python versions). That can easily make a difference, but also the faster function and method calls or looping code that it generates.

I'm sure there's way more to come and I'm happily looking forward to all those cool developments in the web area that we are only just starting to see appear.

Update 2018-07-15: Nexedi posted an article about their attempts to build a fast web server using Cython, both for the framework layer and the processing at the application layer. Worth keeping an eye on.


Lasst es mich einmal ganz klar sagen – München hat ein Zuwanderungsproblem.

Immer mehr Wirtschaftsflüchtlinge aus den umliegenden Gemeinden und Dörfern, aus den sozial und wirtschaftlich abgehängten Regionen Bayerns, ziehen nach München.

Aber: diese Menschen sind nicht Teil unserer Gesellschaft. Sie teilen nicht unsere traditionellen, städtischen Werte!

Sie leben ihren perversen Autofetischismus aus, indem sie mit ihren überdimensionierten Panzerfahrzeugen einen Kriegsstauplatz nach dem anderen begründen, anstatt wie wir rechtschaffenen Bürger die öffentlichen Verkehrsmittel zu nutzen.

Sie terrorisieren mit ihren burschenschaftlich übermotorisierten Zweirädern die friedlich schlafende, natürliche Bevölkerung.

Sie akzeptieren in ihrer primitiven Naivität völlig undemokratische Einstiegsmieten und forcieren dadurch horrende Mietsteigerungen, die dann zur Vertreibung von einheimischen Bestandsmietern genutzt werden.

Sie heben die tyrannische Macht der völkischen Einheitspartei über die heilige Entfaltung unserer multikulturellen Traditionen.

Glaubt mir, wenn ihr sie lasst, werden sie heimtückisch hinter eurem Rücken die CSU wählen!

Städter! Wehrt euch! Gebt den Kampf nicht geschlagen!

Befreit euch von der Unterwanderung der Provinziellen! Stoppt die Zuwanderung der Klerikalpastoralen!

Jetzt! Für immer!

What's new in Cython 0.28?

The freshly released Cython 0.28 is another major step in the development of the Cython compiler. It comes with several big new features, some of them long awaited, as well as various little optimisations. It also improves the integration with recent Python features (including the upcoming Python 3.7) and the Pythran compiler for NumPy expressions. The Changelog has the long list of relevant changes. As always, recompiling with the latest version will make your code adapt automatically to new Python releases and take advantage of the new optimisations.

The most long requested change in this release, however, is the support for read-only memory views. You can now simply declare a memory view as const, e.g. const double[:,:], and Cython will request a read-only buffer for it. This allows interaction with bytes objects and other non-writable buffer providers. Note that this makes the item type of the memory view const, i.e. non-writable, and not just the view itself. If the item type is not just a simple numeric type, this might require minor changes to the data types used in the code reading from the view. This feature was an open issue essentially ever since memory views were first introduced into Cython, back in 2009, but during all that time, no-one stepped forward to implement it. Alongside with this improvement, users can now write view[i][j] instead of view[i,j] if they want to, without the previous slow-down due to sub-view creation.

The second very long requested feature is the support for copying C code verbatimly into the generated files. The background is that some constructs, especially C hacks and macros, but also some adaptations to C specifics used by external libraries, can really only be done in plain C in order to use them from Cython. Previously, users had to create an external C header file to implement these things and then use a cdef extern from ... block to include the file from Cython code. Cython 0.28 now allows docstrings on these cdef extern blocks (with or without a specific header file name) that can contain arbitrary C/C++ code, for example:

cdef extern from *:
    #define add1(i) ((i) + 1)
    cdef int add1(int x) nogil


This is definitely considered an expert feature. Since the code is copied verbatimly into the generated C code file, Cython has no way to apply any validation or safety checks. Use at your own risk.

Another big new feature is the support for multiple inheritance from Python classes by extension types (a.k.a. cdef classes). Previously, extension types could only inherit from other natively implemented types, inlcuding builtins. While cdef classes still cannot inherit only from Python classes, and also cannot inherit from multiple cdef classes, it is now possible to use normal Python classes as additional base classes, following an extension type as primary base. This enables use cases like Python mixins, while still keeping up the efficient memory layout and the fast C-level access to attributes and cdef methods that cdef classes provide.

A bit of work has been done to start reducing the shared library size of Cython generated extension modules. In general, Cython aims to optimise its operations (especially Python operations) for speed and extensively uses C function inlining, optimistic code branches and type specialisations for that. However, the code in the module init function is really only executed once and rarely contains any loops, certainly not time critical ones. Therefore, Cython has now started to avoid certain code intensive optimisations inside of the module init code and also uses GCC pragmas to make the C compiler optimise this specific function for smaller size instead of speed. Without making the import visibly slower, this results in a certain reduction of the overall library size, but probably still leaves some space for future improvements.

Several new optimisations for Python builtins were implemented and often contributed by users. This includes faster operations and iteration for sets and bytearrays, from which existing code can benefit through simple recompilation. We are always happy to receive these contributions, and several tickets in the bug tracker are now marked as beginner friendly "first issues".

Cython has long supported f-strings, and the new release brings another set of little performance improvements for them. More interestingly, however, several common cases of unicode string %-formatting are now mapped to the f-string builder, as long as the argument side is a literal tuple. If the template string uses no unsupported formats, Cython applies this transformation automatically, which leads to visibly faster string formatting and avoids the intermediate creation of Python number objects and the value tuple. Existing code that makes use of %-formatting, including code in compiled Python .py files that needs to stay compatible with Python 2.x, can therefore benefit directly without rewriting all the template strings. Further coverage of formatting features for this transformation is certainly possible, and contributions are welcome.

Finally, a last minute change improves the handling of string literals that are being passed into C++ functions as std::string& references. Previously, the generated code always unpacked a Python byte string and made a fresh copy of it, whereas now Cython detects const arguments and passes the string literal directly. Also in the non-const case, Cython does not follow C++ in outright rejecting the literal argument at compile time, but instead just creates a writable copy and passes it into the function. This avoids special casing in user code and leads to working code by default, as expected in Python, and Cython.

What's new in Cython 0.27?

Cython 0.27 is freshly released and comes with several great improvements. It finally implements all major features that were added in CPython 3.6, such as async generators and variable annotations. The long list of new features and resolved issues can be found in the changelog, or in the list of resolved tickets.

Probably the biggest new feature is the support for asynchronous generators and asynchronous comprehensions, as specified in PEP 525 and PEP 530 respectively. They allow using yield inside of async coroutines and await inside of comprehensions, so that the following becomes possible:

async def generate_results(source):
    async for i in source:
        yield i ** 2
    d = {i: await result for i, result in enumerate(async_results)}
    l = [s for c in more_async_results
         for s in await c]

As usual, this feature is available in Cython compiled modules across all supported Python versions, starting with Python 2.6. However, using async cleanup in generators, e.g. in a finally-block, requires CPython 3.6 in order to remember which I/O-loop the generator must use. Async comprehensions do not suffer from this.

The next big and long awaited feature is support for PEP 484 compatible typing. Both signature annotations (PEP 484) and variable annotations (PEP 526) are now parsed for Python types and cython.* types like list or cython.int. Complex types like List[int] are not currently evaluated as the semantics are less clear in the context of static compilation. This will be added in a future release.

One special twist here is exception handling, which tries to mimic Python more closely than the defaults in Cython code. Thus, it is no longer necessary to explicitly declare an exception return value in code like this:

def add_1(x: cython.int) -> cython.int:
    if x < 0:
        raise ValueError("...")
    return x + 1

Cython will automatically return -1 (the default exception value for C integer types) when an exception is raised and check for exceptions after calling it. This is identical to the Cython signature declaration except? -1.

In cases where annotations are not meant as static type declarations during compilation, the extraction can be disabled with the compiler directive annotation_typing=False.

The new release brings another long awaited feature: automatic ``__richcmp__()`` generation. Previously, extension types required a major difference to Python classes with respect to the special methods for comparison, __eq__, __lt__ etc. Users had to implement their own special __richcmp__() method which implemented all comparisons at once. Now, Cython can automatically generate an efficient __richcmp__() method from the normal comparison methods, including inherited base type implementations. This brings Python classes and extension types another huge step closer.

To bring extension modules also closer to Python modules, Cython now implements the new extension module initialisation process of PEP 489 in CPython 3.5 and later. This makes the special global names like __file__ and __path__ correctly available to module level code and improves the support for module-level relative imports. As with most internal features, existing Cython code will benefit from this by simple recompilation.

As a last feature worth mentioning, the IPython/Jupyter magic integration gained a new option %%cython --pgo for easy profile guided optimisation. This allows the C compiler to take better decisions during its optimisation phase based on a (hopefully) realistic runtime profile. The option compiles the cell with PGO settings for the C compiler, executes it to generate the runtime profile, and then compiles it again using that profile for C compiler optimisation. This is currently only tested with gcc. Support for other compilers can easily be added to the IPythonMagic.py module and pull requests are welcome, as usual.

By design, the Jupyter cell itself is responsible for generating a suitable profile. This can be done by implementing the functions that should be optimised via PGO, and then calling them directly in the same cell on some realistic training data like this:

%%cython --pgo
def critical_function(data):
    for item in data:

# execute function several times to build profile
from somewhere import some_typical_data
for _ in range(100):

Together with the improved module initialisation in Python 3.5 and later, you can now also distinguish between the profile and non-profile runs as follows:

if "_pgo_" in __name__:
    ...  # execute critical code here

Nichts falsch machen

»Das Netz ist voll mit mal mehr, mal weniger ernst gemeinten Aufzählungen der konkreten Vor- und Nachteile einzelner Programmiersprachen. Es ist zwar verlockend, diesen Listen eine weitere hinzuzufügen und dabei unsubtil die eigenen Vorlieben und Abneigungen einfließen zu lassen. Aber die Details solcher konkreten Empfehlungen ändern sich alle paar Jahre, und Sie lernen nicht viel dazu, wenn wir Ihnen einfach raten, »Nehmen Sie einfach Python, damit machen Sie nichts falsch!« (Obwohl Sie damit sicher wirklich nichts falsch machen.)« – Kathrin Passig und Johannes Jander, Weniger schlecht programmieren.

What's new in Cython 0.26?

Cython 0.26 has finally been released and it comes with some big and several smaller new features, contributed by quite a number of non-core developers.

Probably the biggest addition, definitely codewise, is support for Pythran as a backend for NumPy array expressions, contributed by Adrien Guinet. Pythran understands many usage patterns for NumPy, including array expressions and some methods, and can directly be used from within Cython to compile NumPy using code by setting the directive np_pythran=True. Thus, if Pythran is available at compile time, users can avoid writing manual loops and instead often just use NumPy in the same way as they would from Python. Note that this does not currently work generically with Cython's memoryviews, so you need to declare the specific numpy.ndarray[] types in order to benefit from the translation. Also, this requires the C++ mode in Cython as Pythran generates C++ code.

The other major new feature in this release, and likely with an even wider impact, is pickling support for cdef classes (a.k.a. extension types). This is enabled by default for all classes with Python compatible attribute types, which explicitly excludes pointers and unions. Classes with struct type attributes are also excluded for practical reasons (such as a high code overhead), but can be enabled to support pickling with the class decorator @cython.auto_pickle(True). This was a long-standing feature for which users previously had to implement the pickle protocol themselves. Since Cython has all information about the extension type and its attributes, however, there was no technical reason why it can't also generate the support for pickling them, and it now does.

As always, there are several new optimisations including speed-ups for abs(complex), comparing strings and dispatching to specialised function implementations based on fused types arguments. Particularly interesting might be the faster GIL re-entry with the directive fast_gil=True. It tries to remember the current GIL lock state in the fast thread local storage and avoids costly calls into the thread and GIL handling APIs if possible, even when calling across multiple Cython modules.

A slightly controversial change now hides the C code lines from tracebacks by default. Cython exceptions used to show the failing line number of the generated C code in addition to the Cython module code line, now only the latter is shown, like in Python code. On the one hand, this removes clutter that is irrelevant for most users. On the other hand, it hides information that could help developers debug failures from bug reports. For debugging purposes, the re-inclusion of C code lines can now be enabled with a runtime setting as follows:

import cython_runtime

What's new in Cython 0.25?

Cython 0.25 has been released in October 2016, so here's a quick writeup of the most relevant new features in this release.

My personal favourites are the call optimisations. Victor Stinner has done a great job throughout the year to optimise and benchmark different parts of CPython, and one of the things he came up with was a faster way to process function calls internally. For this, he added a new calling convention, METH_FASTCALL, which avoids tuple creation by passing positional arguments as a C array. I've added support for this to Cython and also ported a part of the CPython implementation to speed up calls to Python functions and PyCFunction functions from Cython code.

Further optimisations speed up f-string formatting, cython.inline() and some Python long integer operations (now also in Py2.7 and not only Py3).

The next big feature, one that has been on our list essentially forever, was finally added in 0.25. If you declare a special attribute __dict__ on a cdef class (a.k.a. extension type), if will have an instance dict that allows setting arbitrary attributes on the objects. Otherwise, extension types are limited to the declared attributes and trying to access undeclared ones would result in an AttributeError.

The C++ integration has received several minor improvements, including support for calling subclass methods from C++ classes implemented in Cython code, iterating over std::string with Cython's for-loop, typedef members in class declarations, or the typeid operator.

And a final little goodie is redundant declarations for pi and e in libc.math, which makes from libc cimport math pretty much a drop-in replacement for Python's import math.

Einflussreiche Bücher bei Skoobe

Die Webseite Highly Reco präsentiert hunderte Empfehlungen von Büchern, die berühmten Persönlichkeiten aus Wirtschaft und Gesellschaft etwas bedeutet haben. Seien es Bill Gates, Jessica Livingston, Warren Buffet oder Jeff Bezos, sie alle haben aus Büchern gelernt und schreiben dort über ihre Einflüsse und Ideengeber.

Bei Skoobe finden sich viele dieser Bücher im Abo-Katalog, hier eine kleine Auswahl einflussreicher Bücher, für alle, die ihnen nacheifern möchten.

Python-Bücher bei Skoobe

Neben einer riesigen Auswahl an Belletristik hat das eBook-Abo Skoobe auch zahlreiche gute Sachbücher im Angebot. Hier ist eine kleine Auswahl empfehlenswerter Python-Bücher aus der langen Liste bei Skoobe: