Cython is 20!

Stefan Behnel

2022-04-04 08:52

Today, Cython celebrates its 20th anniversary!

On April 4th, 2002, Greg Ewing published the first release of Pyrex 0.1.

Already at the time, it was invented and designed as a compiler that extended the Python language with C data types to build extension modules for CPython. A design that survived the last 20 years, and that made Pyrex, and then Cython, a major corner stone of the Python data ecosystem. And way beyond that.

Now, on April 4th, 2022, its heir Cython is still very much alive and serves easily hundreds of thousands of developers worldwide, day to day.

I'm very grateful for Greg's ingenious invention at the time. Let's look back at how we got to where we are today.

I came to use Python around the time when Pyrex was first written. Python had already been around for a dozen years, version 2.2 was the latest, and had brought us new style classes which represented a major redesign of the type system, as well as the iterator protocol and even generators, which you had to enable with from __future__ import generators, because of potential conflicts with the new keyword yield. There weren't many times in Python's history when that was necessary. Amongst the greatest Python releases of all times, that was one of them.

Python 2.3 was in the makings, about to bring us another bunch of cool new features like sum(), enumerate() or sets as a data type in the standard library.

CPython already had its reputation of providing a large standard library, and a whole set of third-party packages that added another heap of functionality – although Perl's CPAN would easily cover all of it in its mere shadow. There was no pip install anything at the time (not even easy_install), no virtualenvs, no wheels (and no eggs), some form of binary packages, but you'd mostly python setup.py build your own software, especially on Linux. Those were the days.

Many of the binary third-party packages at the time were hand-written using the bare C-API. There was SWIG for generating C wrappers for lots of languages, including Python. It worked, it was actually great to be able to generate multiple wrappers from a single source. But they all looked mostly the same, and making them work and feel well for each of the language environments was hard to impossible. And few people really needed wrappers for more than the one language they used. So, lots of people used the C-API, and CPython had a reputation of being easily extensible in C – assuming you knew C well enough. And more and more Python users didn't.

In came Greg's idea of writing extension modules in Python. Or in something that looked a lot like Python, with a few C-ish extensions.

I don't know how he came up with the name Pyrex, which is a brand name for thermal-resistant glass (originally invented here in Germany in 1893). But Pyrex clearly hit a nerve at the time and grew very quickly. Within weeks and months, there was support not only for functions but for extension types, and for a growing number of Python language features.

By the time I came to use Pyrex, it was already in a very attractive and feature rich state. From the start, its unique selling point was to allow Python developers to write C code, without having to write C code. And that had made it the basis for large projects like Sage, for which it provided a critical software development infrastructure, being the glue code between heaps of C/C++ math libraries and Python.

In 2004, Martijn Faassen took on a project (and he's good at taking on projects) of making XML in Python actually usable. There was support before, there was minidom, there was PyXML with an extended feature set. But many XML features of the time were missing, the tools were memory hungry and slow, and the DOM-API was far from anything Python users would fall in love with.

There was also a Python interface for libxml2, a C library that covered a large part of the important XML technologies at the time. With the caveat of mapping mostly its C-API to Python 1:1, and thus feeling excessively C-ish and exotic to Python users, making it easy to trigger hard crashes at the same time.

There was another alternative, though: ElementTree, designed by the recently deceased Fredrik Lundh (thanks for all the fish, Fredrik). It was not in the standard library at the time, it only got adopted there in Python 2.5 (together with SQLite), one and a half years later. It was an external package based on the pyexpat parser, and it provided a lovely pythonic API for XML processing. But with even less features than minidom.

So, Martijn decided to bring it all together: the bunch of XML features from libxml2, exposed in the pythonic, and already well established, interface of ElementTree. And being a Python developer, wanting to design the interface from the Python point of view, he chose Pyrex to implement that wrapper, and called it lxml.

I found the lxml project almost a year later, while looking for something that I could use as an extensible XML API. I implemented some features for lxml to turn it into that, and along the way, made enhancements also to Pyrex. Over the Pyrex mailing list, I got in touch with other developers who had their own more or less enhanced versions of Pyrex, including Robert Bradshaw, one of the developers in the Sage project. Eventually, in 2007, we decided to follow the example of the Apache web server in bringing together the scattered bunch of existing Pyrex patches into a new project. William Stein from the Sage project came up with a good name, and the infrastructure to maintain it – github.com wasn't launched yet, and we used the Mercurial DVCS. Thus, the Cython project was born.

It was the beginning of a second, long success story.

In the early years, William Stein was able to provide funding to the Cython project from Sage's resources, given how important Cython was for the development of the Sage Math package. Cython was an integral part of the Sage development sprints called Sage days. We participated in Google's Summer of Code events that brought us in touch with Dag Sverre Seljebotn and Mark Florisson, both of whom moved Cython's integration with NumPy and data processing forward in large steps. And the Sage project also sponsored a workshop in München (where I was living at the time) so that we were able to sit together in person, for the first time, discussing, designing and building many great features in Cython, as well as major advances in the coverage of (then) more recent Python language features, in which Vitja Makarov played an important role.

Over time, the list of contributors to Cython grew longer and longer, from large feature additions to small bug fixes and helpful documentation improvements. In 2008, me and Lisandro Dalcín implemented support for Python 3.0 before it was even released, just like Pyrex and Cython have followed CPython's development ever since they existed, allowing users to easily adapt their extension modules to various C-API changes across (C)Python releases. And in the other direction, some of the optimisations that CPython's own internal code generation tool argument clinic employs for fast Python function argument parsing, were adopted from Cython.

I remember discussions and cooperations with the CPython developers Yury Selivanov on async features and with Victor Stinner, Petr Viktorin or Nick Coghlan on Python C-API topics. Several PyPy developers, including Ronan Lamy and Matti Picus, have helped in word and code to improve the integration and stability between both tools. The exchange with people from large and impactful projects like NumPy, Pandas and the scikit-* family of tools has always helped moving Cython in a user centric direction, while giving me a warm feeling that it truly enables its users to get their work done. And the emergence of complementary tools like pybind11 or Numba has helped to diversify the choices throughout the ecosystem in which Cython resides, while only broadening the field without reducing the impact that the language and compiler has for its users.

Today, after 20 years of development, Cython is a modern programming language, embedded in the Python language rather than the other way round, but still extending it with C/C++ super powers.

We helped our users help their users through many exciting endeavours along the way, taking pictures of black holes, sending robots to Mars, scaling up Django websites to a thousand million users, building climate models, analysing, processing and machine learning of human text, real world images, and other data from countless areas, be it scientific, financial, economic, ecologic or probably any other type of data from small to large scale.

I'm proud and happy to see how far Cython has come from its early beginnings. And I'm excited to continue seeing where it will go from here.

So, from New Zealand, from Europe and the Americas, from Asia, Australia and Africa, to anywhere on Earth, and maybe Mars…

Happy anniversary, Cython!