lxml christmas funding    Posted:

My bicycle was recently stolen and since I now have to get a new one, here's a proposal.

From today on until December 24th, I will divert all donations that I receive for my work on lxml to help in restoring my local mobility.

If you do not like this 'misuse', do not donate in this time frame. I do hope, however, that some of you like the idea that the money they give for something they value is used for something that is of value to the receiver.

All the best -- Stefan

Faster Fractions    Posted:

I spent some time optimising Python's "fractions" module. Fractions (i.e. rational numbers) are great for all sorts of exact computations, especially money calculations. You never have to care about loss of precision, you can freely mix very large and very small numbers any way you like in your computations - the result is always exact as it's all done in integers internally.

But the performance used to suck. Totally. The main problem was the type instantiation, which is really expensive. For example, simply changing this code

f = n * Fraction(x ,y)

to this

f = Fraction(n * x, y)

(which avoids intermediate Fraction operations) could speed it up by factors. I provided some patches that streamline common cases (numerator and denominator will usually be Python ints), and this made the implementation in CPython 3.5 twice as fast as before. It actually starts being usable. :)

For those who can't wait for Python 3.5 to come out (in about a year's time), and also for those who want even better performance (like me), I dropped the implementation into Cython and optimised it further at the C level. That gave me another factor of 5, so the result is currently about 10x faster than what's in the standard library.

Compared to the Decimal type in Python 2.7, it's about 15x faster. The hugely improved C reimplementation of the "decimal" module in Python 3.3 is still about 5-6x faster - or less, if you often need to rescale your values along the way. Plus, with decimal, you always have to take care of using the right precision scale for your code to prevent rounding errors, and playing it safe will slow it down.

I released the module to PyPI, it's called quicktions. Hope you like it.

Cython courses ahead in Leipzig (Germany)    Posted:

I'm giving an in-depth learn-from-a-core-dev Cython training at the Python Academy in Leipzig (Germany) next month, October 16-17. In two days, the course will cover everything from a Cython intro up to the point where you bring Python code to C speed and use C/C++ libraries and data types in your code.

Read more on it here:


What is Cython?

Cython is an optimising static compiler for Python that makes writing C extensions as easy as Python itself. It greatly extends the limits of the Python language and thus has found a large user base in the Scientific Computing community. It also powers various well-known extension modules in the Python package index. Cython is a great way to extend the CPython runtime with native code when performance matters.

Who am I?

I'm Stefan Behnel, a long-time Cython core developer specialising in High-Performance Python trainings and fast code. See my website.

Bound methods in Python    Posted:

I keep seeing a lot of code around that does this:

import re

_MATCH_DIGITS_RE = re.compile('[0-9]+')

and then uses this regular expression only in one place, e.g. like this:

def func():
    numbers = _MATCH_DIGITS_RE.findall(input)

Python's re module actually uses expression caching internally, so it's very unlikely that this is any faster in practice than just writing this:

def func():
    numbers = re.findall('[0-9]+', input)

Which is a shorter and much more straight forward way to write what's going on. Now, for longer and more complex regular expressions, this can actually get out of hand and it does help to give them a readable name. However, all-upper-case constant names tend to be pretty far from readable. So, I always wonder why people don't just write this using a bound method:

_find_numbers = re.compile('[0-9]+').findall

def func():
    numbers = _find_numbers(input)

I find this much clearer to read. And it nicely abstracts the code that uses the function-like callable _find_numbers() from the underlying implementation, which (in case you really want to know) happens to be a method of a compiled regular expression object.

Faster Python calls in Cython 0.21    Posted:

I spent some time during the last two weeks reducing the call overhead for Python functions and methods in Cython. It was already quite low compared to CPython before, about 30-40% faster, but profiling then made me stumble over the fact that method calls in CPython really just do one thing: they repack the argument tuple and prepend the 'self' object to it. However, that is done right after Cython has carefully packed up exactly that argument tuple in the first place, so by simply inlining what PyMethodObject does, we can avoid packing tuples twice.

Avoiding to create a PyMethodObject at all may also appear as an interesting goal, but doing that is totally not easy (it happens during attribute lookup) and it's also most likely not worth it as method objects are created from a freelist, which makes their instantiation very fast. Method objects also hold actual state that the caller must receive: the underlying function and the self object. So getting rid of them will severly complicate things without a major gain to expect.

Another obvious optimisation, however, is that Python code calls into C implemented functions quite often, and if those are implemented as specialised functions that take exactly one or no argument (METH_O/METH_NOARGS), then the tuple packing and unpacking can be avoided completely. Together with the method call optimisation, this means that Cython can now call very simple methods without creating an argument tuple, and less simple ones without redundantly creating a second argument tuple.

I implemented these optimisations and they immediately blew up the method call micro benchmarks in Python's benchmark suite from about 1/3 to 2-3 times faster than CPython 3.5 (pre). Those are only simple micro benchmarks, so any real world code will benefit substantially less overall. However, it turned out that a couple of benchmarks in the suite that are based on real production code ended up loosing 5-15% of their total runtime. That's quite remarkable, given that the code they call actually does something (much) more heavy weight than the call overhead itself. I'm still tuning it a bit, but so far am really happy with this result.

Running coverty scan on lxml    Posted:

Hearing a talk about static analysis at EuroPython 2014 and meeting Christian Heimes there (CPython core dev and member of the security response team) got us talking about running Coverty Scan on Cython generated code. They provide a free service for Open Source projects, most likely because there is a clear benefit in terms of marketing visibility and distributed filtering work on a large amount of code.

The problem with a source code generator is that you can only run the analyser on the generated code, so you need a real world project that uses the generator. The obvious choice for us was lxml, as it has a rather large code base with more than 230000 lines of C code, generated from some 20000 lines of Cython code. The first run against the latest lxml release got us about 1200 findings, but a quick glance through them showed that the bulk of them were false positives for the way Cython generates code for some Python constructs. There was also a large set of "dead code" findings that I had already worked on in Cython a couple of months ago. It now generates substantially less dead code. So I gave it another run against the current developer versions of both lxml and Cython.

The net result is that the number of findings went down to 428. A large subset of those relates to constant macros in conditions, which is what I use in lxml to avoid a need for C level #ifdefs. The C compiler is happy to discard this code, so Coverty's dead code finding is ok but not relevant. Other large sets of "dead code" findings are due to Cython generating generic error handling code in cases where an underlying C macro actually cannot fail, e.g. when converting a C boolean value to Python's constant True/False objects. So that's ok, too.

It's a bit funny that the tool complains about a single "{" being dead code, although it's followed immediately by a (used) label. That's not really an amount of code that I'd consider relevant for reporting.

On the upside, the tool found another couple of cases in the try-except implementation where Cython was generating dead code, so I was able to eliminate them. The advantage here is that a goto statement can be eliminated, which may leave the target label unused and can thus eliminate further code under that label that would be generated later but now can also be suppressed. Well, and generating less code is generally a good thing anyway.

Overall, the results make a really convincing case for Cython. Nothing of importance was found, and the few minor issues where Cython still generated more code than necessary could easily be eliminated, so that all projects that use the next version can just benefit. Compare that to manually written C extension code, where reference counting is a large source of errors and the verbose C-API of CPython makes the code substantially harder to get right and to maintain than the straight forward Python syntax and semantics of Cython. When run against the CPython code base for the first time, Coverty Scan found several actual bugs and even security issues. This also nicely matches the findings by David Malcolm and his GCC based analysis tool, who ended up using Cython generated code for eliminating false positives, rather than finding actual bugs in it.

Gesellschaft zum Wegwerfen    Posted:

Frühstück im Hotel. Um mich herum alles voll mit mollig geformten Männern mittleren Alters, die stolz den selbst am Buffet gejagten Schinken auf einem See von Rührei über die ganze weite Strecke bis zu ihrem Platz schleppen. Frühsport kann eben nicht anstrengend genug sein, das dürfen andere ruhig sehen.

Neben mir zwei Frauen, vielleicht aus Südamerika, vielleicht arabisch. Unter der zentimeterdicken Schminkschicht nicht mehr zu unterscheiden. Vielleicht eher arabisch, Sprache klingt nicht nach Spanisch oder Portugiesisch, auch wenn das vielleicht nichts heißt. Vorher schon beiden beim Brottisch begegnet. Eine schaut verächtlich auf die ausgelegten Brote und Brötchen, greift dann in die Plastikware, nimmt je zwei eingeschweiste Päckchen von allem: Brandt-Zwieback, Pumpernickel, Knäckebrot. Pro Packung zwei Scheiben. Der Teller dazu wird mit Eiern, Quark und Gurke gefüllt.

Dann, am Platz, ein Probieren des Rühreis, offensichtlicher Mangel an Geschmackserlebnis. Mit viel Quark drauf werden ein paar Bissen Zwieback mit einem Löffelchen Rührei runtergewürgt, dann geht der ganze Teller per Bedienung in den Müll. Oben drauf schnell noch ein intaktes gekochtes Ei der zweiten Frau. Das war wohl nur unabsichtlich auf deren Teller gelandet, denn jetzt geht sie los und kommt mit einem fertig gepellten und geschnittenen Ei wieder. Klar, wäre ja auch viel zu viel Aufwand gewesen, das selbst zu öffnen, wozu gibt es schließlich Bedienstete.

Ich bin fertig mit diesem Frühstück.

Neulich bei Aldi    Posted:

Ich muss es leider zugeben, ich gehe manchmal zu Aldi. Ungern, aber ich tue es. Noch.

Neulich finde ich ein halbes Kilo Tomaten in den Gemüseregalen. Packung eingerissen, aber Tomaten OK. Schließlich an der Kasse, sitzt dort eine Azubine. Sie zieht meine Einkäufe durch den Scanner, sieht dann die Verpackung der Tomaten, schaut mich mitleidig an und meint: „Wollen Sie sich neue holen?“. Wie immer war ich nicht schlagfertig genug, sondern habe nur „Nein“ gesagt. Nachher: geärgert. Klar.

Leute, ich will euren Müll doch nicht. Ich will die Tomaten. Wenn ihr die besser transportieren könnt, weil Plastik außen drum ist, dann ist das vielleicht für euch ganz nett, aber ich will das nicht. Und ganz besonders will ich nicht, dass ihr wertvolle Lebensmittel zu Müll macht, nur weil der wirkliche Müll außen rum nicht mehr eurer Vorstellung von schönem Müll entspricht. Wie krank kann man eigentlich sein?

Finding elements in XML    Posted:

With more than half a million PyPI downloads every month, the lxml toolkit is the most widely used external XML library for Python. It's even one of the most often downloaded packages overall, with apparently millions of users world wide. If you add to that the fact that it's mostly compatible with the standard library ElementTree package, it shows that its API is highly predominant in the Python world for anything that's XML processing.

However, I keep seeing people ask about the best way to find stuff in their XML documents. I guess that's because there are a couple of ways to do that: by (recursively) traversing the tree, by iterating over the tree, or by using the find() methods, XPath and even CSS selectors (which now rely on the external cssselect package). There are many ways to do this, simply because there are so many use cases. XPath is obviously the does-it-all tool here, and many people would quickly recommend an XPath expression when you give them a search problem, but it's also a lot like regular expressions for XML: now you have two problems.

Often, tree iteration is actually a much better way to do this than XPath. It's simpler and also faster in most cases. When looking for a specific tag, you can simply say:

for paragraph in tree.iter('p'):
    ...  # do something with the <p> element

Note that this iterates over the tree and processes elements as they are found, whereas the XPath engine always evaluates the expression exhaustively and only then returns with the complete list of results that it found. Expressing in XPath that only one element is wanted, i.e. making it short-circuit, can be quite a challenge.

Even when looking for a set of different tags, tree iteration will work just beautifully:

for block in tree.iter('p', 'div', 'pre'):
    if block.tag == 'p': ...
    elif block.tag == 'pre': ...
    else: ...

For the more involved tasks, however, such as finding an element in a specific subtree below sme parent, or selecting elements by attributes or text, you will quickly end up wanting to use a path language anyway. Lxml comes with two implementations here: ElementPath (implemented in Python) and XPath (implemented in C by libxml2). At this point, you might think that XPath, being implemented in C, should generally be a better choice, but that's actually not true. In fact, the ElementPath implementation in lxml makes heavy use of lxml's optimised tree iterators and tends to be faster than the generic XPath engine. Especially when only looking for one specific element, something like this is usually much faster than using the same expression in XPath:

target = root.find('.//parent/child//target')

One really nice feature of ElementPath is that it uses fully qualified tag names rather than resorting to externally defined prefixes. This means that a single expression in a self-contained string is enough to find what you are looking for:

target = root.find('.//{http://some/namespace}parent/*/{http://other/namespace}target')

On the downside, the predicate support in ElementPath is substantially more limited than in real XPath. Functions do not exist at all, for example. However, for simple things like testing attribute values, you don't have to sacrifice the performance of ElementPath.

for link in root.iterfind('.//a[@href]'):
    print link.get('href')

This example only tests for the existence of the attribute, which may still have an empty value. But you can also look for a specific value:

for link in root.iterfind('.//a[@href = "http://lxml.de/"]'):
    print("link to lxml's homepage found")

So, while XPath may appear like the one thing that helps with all your search needs, it's often better to use a simpler tool for the job.