<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Stefans Welt</title>
	<atom:link href="http://blog.behnel.de/index.php?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://blog.behnel.de</link>
	<description>Ik sech dat man so ...</description>
	<pubDate>Tue, 03 Apr 2012 12:19:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>top of the list</title>
		<link>http://blog.behnel.de/index.php?p=217</link>
		<comments>http://blog.behnel.de/index.php?p=217#comments</comments>
		<pubDate>Tue, 03 Apr 2012 12:15:54 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[Software]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[lxml]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=217</guid>
		<description><![CDATA[lxml is top of the list for Python 3 downloads from PyPI!

]]></description>
			<content:encoded><![CDATA[<p>lxml is <a href="https://python3wos.appspot.com/">top of the list</a> for Python 3 downloads from PyPI!</p>
<p><a href="https://python3wos.appspot.com/"><img src="http://5156595.de.strato-hosting.eu/cgi-data/weblog_basic/uploads/2012/04/py3wos-lxml.png" alt="lxml is top of the list!" title="Python 3 Wall of Shame" class="aligncenter size-full wp-image-218" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=217</wfw:commentRss>
		</item>
		<item>
		<title>XML parser performance in CPython 3.3 and PyPy 1.7</title>
		<link>http://blog.behnel.de/index.php?p=210</link>
		<comments>http://blog.behnel.de/index.php?p=210#comments</comments>
		<pubDate>Fri, 16 Dec 2011 15:40:35 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[lxml]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=210</guid>
		<description><![CDATA[In a recent article, I compared the performance of MiniDOM and the three ElementTree implementations ElementTree, cElementTree and lxml.etree for parsing XML in CPython 3.3. Given the utterly poor performance of the pure Python library MiniDOM in this competition, I decided to give it another chance and tried the same in PyPy 1.7. Because lxml.etree [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://blog.behnel.de/index.php?p=197">recent article</a>, I compared the performance of MiniDOM and the three ElementTree implementations ElementTree, cElementTree and lxml.etree for parsing XML in CPython 3.3. Given the utterly poor performance of the pure Python library MiniDOM in this competition, I decided to give it another chance and tried the same in <a href="http://pypy.org/">PyPy</a> 1.7. Because lxml.etree and cElementTree are not available on this platform, I only ran the tests with plain ElementTree and MiniDOM. I also report the original benchmark results for CPython below for comparison.</p>
<p><img src="http://5156595.de.strato-hosting.eu/cgi-data/weblog_basic/uploads/2011/12/py33-pypy17-xml-bench-time.png" alt="Parser performance of XML libraries in CPython 3.3 and PyPy 1.7" title="Parser performance of XML libraries in CPython 3.3 and PyPy 1.7" width="587" height="443" class="aligncenter size-full wp-image-211" /></p>
<p>While I also provide numbers regarding the memory usage of each library in this comparison, they are not directly comparable between PyPy and CPython because of the different memory management of both platforms and because the overall memory that PyPy uses right from the start is much larger than for CPython. So the relative increase in memory may or may not be an accurate way to tell what each runtime does with the memory. However, it appears that PyPy manages to kill at least the severe memory problems of MiniDOM, as the total amount of memory used for the larger files is several times smaller than that used by CPython.</p>
<p><img src="http://5156595.de.strato-hosting.eu/cgi-data/weblog_basic/uploads/2011/12/py33-pypy17-xml-bench-memory.png" alt="Memory usage of XML trees in CPython 3.3 and PyPy 1.7" title="Memory usage of XML trees in CPython 3.3 and PyPy 1.7" width="587" height="462" class="aligncenter size-full wp-image-212" /></p>
<p>So, what do I take from this benchmark? If you have legacy MiniDOM code lying around, you want PyPy to run it. It exhibits several times better performance in terms of memory and runtime. It also performs substantially better for ElementTree than the plain Python ElementTree in CPython.</p>
<p>However, for fast XML processing in general, the better performance of PyPy even for plain Python ElementTree is not really all that interesting, because it is still several times slower than cElementTree or lxml.etree in CPython. That means that you will often be able to process multiple files in CPython in the time that you need for just one in PyPy, even if your actual application code that does the processing manages to get a substantial JIT speed-up in PyPy. Even worse, the GIL in PyPy will keep your code from getting a parallel speedup that you usually get with multi-threaded processing in lxml and CPython, e.g. in a web server setting.</p>
<p>So, as always, the decision depends on what your actual application does and which library it uses. Do your own benchmarks.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=210</wfw:commentRss>
		</item>
		<item>
		<title>XML parser performance in Python 3.3</title>
		<link>http://blog.behnel.de/index.php?p=197</link>
		<comments>http://blog.behnel.de/index.php?p=197#comments</comments>
		<pubDate>Fri, 09 Dec 2011 11:56:52 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[lxml]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=197</guid>
		<description><![CDATA[For a recent bug ticket about MiniDOM, I collected some performance numbers that compare it to ElementTree, cElementTree and lxml.etree under a recent CPython 3.3 developer build, all properly compiled and optimised for Linux 64bit, using os.fork() and the resource module to get a clean measure of the memory usage for the in-memory tree. Here [...]]]></description>
			<content:encoded><![CDATA[<p>For a recent <a href="http://bugs.python.org/issue11379">bug ticket</a> about <a href="http://docs.python.org/library/xml.dom.minidom.html">MiniDOM</a>, I collected some performance numbers that compare it to <a href="http://docs.python.org/library/xml.etree.elementtree.html">ElementTree, cElementTree</a> and <a href="http://lxml.de/">lxml.etree</a> under a recent CPython 3.3 developer build, all properly compiled and optimised for Linux 64bit, using os.fork() and the <a href="http://docs.python.org/library/resource.html">resource</a> module to get a clean measure of the memory usage for the in-memory tree. Here are the numbers:</p>
<p>Parsing hamlet.xml in English, 274KB:</p>
<pre>
Memory usage: 7284
xml.etree.ElementTree.parse done in 0.104 seconds
Memory usage: 14240 (+6956)
xml.etree.cElementTree.parse done in 0.022 seconds
Memory usage: 9736 (+2452)
lxml.etree.parse done in 0.014 seconds
Memory usage: 11028 (+3744)
minidom tree read in 0.152 seconds
Memory usage: 30360 (+23076)
</pre>
<p>Parsing the old testament in English (ot.xml, 3.4MB) into memory:</p>
<pre>
Memory usage: 20444
xml.etree.ElementTree.parse done in 0.385 seconds
Memory usage: 46088 (+25644)
xml.etree.cElementTree.parse done in 0.056 seconds
Memory usage: 32628 (+12184)
lxml.etree.parse done in 0.041 seconds
Memory usage: 37500 (+17056)
minidom tree read in 0.672 seconds
Memory usage: 110428 (+89984)
</pre>
<p>A 25MB XML file with Slavic Unicode text content:</p>
<pre>
Memory usage: 57368
xml.etree.ElementTree.parse done in 3.274 seconds
Memory usage: 223720 (+166352)
xml.etree.cElementTree.parse done in 0.459 seconds
Memory usage: 154012 (+96644)
lxml.etree.parse done in 0.454 seconds
Memory usage: 135720 (+78352)
minidom tree read in 6.193 seconds
Memory usage: 604860 (+547492)
</pre>
<p>And a contrived 4.5MB XML file with a lot more structure than data and no whitespace at all:</p>
<pre>
Memory usage: 11600
xml.etree.ElementTree.parse done in 3.374 seconds
Memory usage: 203420 (+191820)
xml.etree.cElementTree.parse done in 0.192 seconds
Memory usage: 36444 (+24844)
lxml.etree.parse done in 0.131 seconds
Memory usage: 62648 (+51048)
minidom tree read in 5.935 seconds
Memory usage: 527684 (+516084)
</pre>
<p>I also took the last file and pretty printed it, thus adding lots of indentation whitespace that increased the file size to 6.2MB. Here are the numbers for that:</p>
<pre>
Memory usage: 13308
xml.etree.ElementTree.parse done in 4.178 seconds
Memory usage: 222088 (+208780)
xml.etree.cElementTree.parse done in 0.478 seconds
Memory usage: 103056 (+89748)
lxml.etree.parse done in 0.199 seconds
Memory usage: 101860 (+88552)
minidom tree read in 8.705 seconds
Memory usage: 810964 (+797656)
</pre>
<p>Yes, 2MB of whitespace account for almost 300MB more memory in MiniDOM.</p>
<p>Here are the graphs:</p>
<p><img src="http://5156595.de.strato-hosting.eu/cgi-data/weblog_basic/uploads/2011/12/py33-xml-bench-memory1.png" alt="XML tree memory usage in Python 3.3 for lxml, ElementTree, cElementTree and MiniDOM" title="XML tree memory usage in Python 3.3 for lxml, ElementTree, cElementTree and MiniDOM" width="577" height="441" class="aligncenter size-full wp-image-204" /></p>
<p><img src="http://5156595.de.strato-hosting.eu/cgi-data/weblog_basic/uploads/2011/12/py33-xml-bench-time1.png" alt="XML perser performance in Python 3.3 for lxml, ElementTree, cElementTree and MiniDOM" title="XML perser performance in Python 3.3 for lxml, ElementTree, cElementTree and MiniDOM" width="588" height="440" class="aligncenter size-full wp-image-205" /></p>
<p>I think it is pretty clear that minidom has basically left the scale, whereas cElementTree and lxml.etree are pretty close to each other. lxml tends to be a tad faster, and cElementTree tends to use a little less memory.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=197</wfw:commentRss>
		</item>
		<item>
		<title>PyCon-DE 2011</title>
		<link>http://blog.behnel.de/index.php?p=190</link>
		<comments>http://blog.behnel.de/index.php?p=190#comments</comments>
		<pubDate>Wed, 12 Oct 2011 19:44:50 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[deutsch]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=190</guid>
		<description><![CDATA[Die erste PyCon-DE überhaupt ist zu Ende gegangen. Sie war ein Riesenerfolg, sowohl für mich selbst als auch nach allem, was ich so von den anderen Teilnehmern gehört habe. Etliche interessante Vorträge aus unterschiedlichsten Bereichen, eine Menge Leute die ich entweder gerne wieder getroffen habe, immer schon mal treffen wollte, oder mit denen ich noch [...]]]></description>
			<content:encoded><![CDATA[<p>Die erste <a href="http://de.pycon.org/2011/home/">PyCon-DE</a> überhaupt ist zu Ende gegangen. Sie war ein Riesenerfolg, sowohl für mich selbst als auch nach allem, was ich so von den anderen Teilnehmern gehört habe. Etliche interessante Vorträge aus unterschiedlichsten Bereichen, eine Menge Leute die ich entweder gerne wieder getroffen habe, immer schon mal treffen wollte, oder mit denen ich noch nie etwas zu tun aber nun einige interessante Diskussionen hatte. Die ganze Organisation lief wie am Schnürchen und sogar das Essen war ebenso gut wie abwechslungsreich.</p>
<p>Eines der wichtigsten Ergebnisse der Konferenz war die Gründung des <a href="http://python-verband.de/">Python Software Verband e.V.</a> als Nachfolger der ehemals Zope-spezifischen DZUG. Die Neuausrichtung wird es wesentlich erleichtern, die deutschsprachige Python-Gemeinde unter ein gemeinsames Dach zu bringen, und die Python-Lobby in Deutschland, Österreich und der Schweiz zu verbessern.</p>
<p>Ich selbst habe zwei Vorträge über Cython und lxml gehalten, sowie ein Tutorial zu Cython. Alle wurden mit großem Interesse aufgenommen (wobei ich noch auf die konkreten Rückmeldungen zum Tutorial warte) und gaben Anlass zu einigen interessanten Diskussionen. Cython und lxml sind weiterhin zwei Best-of-Breed Tools, und große Themen in der Python-Gemeinde. Besonders lxml hat mir einiges an Schulterklopfen dafür eingebracht, dass ich es in den letzten Jahren zu dem einen großen XML-Tool für Python gemacht habe. Paul Everitt, der eine <a href="http://de.pycon.org/2011/keynotes/">Keynote</a> hielt und den ich eigentlich immer schon mal treffen wollte (hiermit passiert), hat sogar mitten in seinem Vortrag eine Riesenfolie aufgelegt, auf der nur zwei Namen standen - Martijn Faassen (der mit lxml angefangen hat) und ich. Werde ich also doch noch berühmt auf meine alten Tage &#8230;</p>
<p>Einige Zeit habe ich mit Kay Hayen diskutiert, der an einem statischen Python-Compiler namens <a href="http://www.nuitka.net/blog/nuitka-a-python-compiler/what-is-nuitka/">Nuitka</a> schreibt. Wenig überraschend stand er dabei vor etlichen der Probleme, in die wir auch mit Cython gelaufen sind. <a href="http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/">Er hat Recht damit</a>, dass ich nicht gänzlich froh darüber bin, dass er ein separates Projekt begonnen hat anstatt uns mit Cython zu helfen, aber so ist OpenSource nun einmal. Jede/r hat das Recht, so viele Räder zu erfinden wie es Spaß macht. Soweit ich es verstehe, strebt Kay mit Nuitka eine Untermenge von dem an was wir aus Cython machen, aber kommt dabei von der anderen Seite. Cython war früher nur eine Erweiterungssprache und entwickelt sich nun zusätzlich zu einem vollwertigen Python-Compiler, wohingegen Nuitka einzig und allein die Nische des Python-Compilers füllen soll. Aber bisher hat Kay schon dabei einiges an Chuzpe und Durchhaltevermögen bewiesen, da warten vielleicht noch ein paar Überraschungen&#8230;</p>
<p>Es war interessant, einige Vorträge zu Themen zu sehen, mit denen sich auch mein Arbeitgeber herumschlägt - nur eben mit Python statt mit Java. So arbeitet beispielsweise eine interne Abteilung bei SAP an einer Web-basierten Client-Infrastruktur für SAP-Systeme in Python, inklusive Objekt-nach-SAP Mapper (ähnlich einem ORM), Offline-Caching-Mechanismen und so weiter. Im Vortrag sah es ganz so aus als könnte sich das allgemein als interessant für SAP-Clients erweisen, auch ganz unabhängig von Web-Anwendungen. Und es könnte bald schon OpenSource sein&#8230;</p>
<p>Ein weiterer Vortrag, bei dem ich mich gleich heimisch gefühlt habe, handelte von PyTAF, einem grafischen Framework zur Anwendungsintegration. Es wird intern bei der LBBW in Stuttgart entwickelt und erreicht mehr oder weniger das, was wir in Java machen. Darüber hinaus hat es ein GUI mit dem Datenintegrationsprozesse grafisch zusammengesetzt werden können, und es ist in Python geschrieben, was bei dieser Art von Software ein ernst zu nehmender Vorteil ist. Übrigens verwendet es intern <a href="http://lxml.de/objectify.html">lxml.objectify</a> zur Datenverarbeitung - eine sehr gute Wahl <img src='/strato-data/Weblog25//wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Die PyCon-DE im nächsten Jahr könnte ruhig wieder am selben Ort stattfinden. So gut, wie die diesjährige funktioniert hat, gibt es eigentlich keinen Grund zu wechseln. Obwohl sowas wie Berlin natürlich auch immer eine Reise wert ist &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=190</wfw:commentRss>
		</item>
		<item>
		<title>PyCon-DE 2011 (en)</title>
		<link>http://blog.behnel.de/index.php?p=188</link>
		<comments>http://blog.behnel.de/index.php?p=188#comments</comments>
		<pubDate>Wed, 12 Oct 2011 19:42:19 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[english]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=188</guid>
		<description><![CDATA[The first PyCon-DE ever is over. It was a huge success, both from my own POV and from what I heard from others. Quite a number of interesting talks from a very broad spectrum, loads of people that I either knew already, always wanted to meet, or had never heard of but found interesting to [...]]]></description>
			<content:encoded><![CDATA[<p>The first <a href="http://de.pycon.org/2011/home/">PyCon-DE</a> ever is over. It was a huge success, both from my own POV and from what I heard from others. Quite a number of interesting talks from a very broad spectrum, loads of people that I either knew already, always wanted to meet, or had never heard of but found interesting to talk to. The organisation worked out impressively well, even the food was as good as it was diverse.</p>
<p>One of the major outcomes was the formation of the &#8220;Python Software Verband e.V.&#8221; as a successor to the previous Zope centered &#8220;DZUG e.V.&#8221;. The new direction will make it much easier to gather the German speaking Python community under a common umbrella, and to enforce the Python lobbying in Germany, Austria and Switzerland.</p>
<p>I gave two talks on Cython and lxml, as well as a tutorial on Cython. All of them were well received (although I&#8221;m still waiting for the final feedback on the tutorial) and gave the chance for interesting discussions. Both Cython and lxml continue to be best of breed tools and hot topics in the community, and I received a lot of backslapping for making lxml the one great XML tool for Python over the last few years. One of the <a href="http://de.pycon.org/2011/keynotes/">keynote speakers</a>, Paul Everitt, whom I wanted to meet for a while until I finally got the chance now, even put up a huge slide right in his talk with only two names on it, that of Martijn Faassen (the original author of lxml) and mine. I&#8221;m finally getting famous. <img src='/strato-data/Weblog25//wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>I spent some time talking to Kay Hayen, who has written a static Python compiler called <a href="http://www.nuitka.net/blog/nuitka-a-python-compiler/what-is-nuitka/">Nuitka</a>. It was not surprising that he bumped into a lot of the problems that we met with Cython as well. <a href="http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/">He&#8217;&#8217;s right</a> in that I&#8221;m not entirely happy about the fact that he started a completely separate project instead of helping with Cython, but that&#8217;&#8217;s OpenSource. People are free to reinvent as many wheels as they like. From what I understand, Nuitka aims to become a subset of what Cython heads for, just coming from a different side. Cython has originally been an extension language and is now additionally evolving into a Python compiler, whereas Nuitka is plainly targeted at being a Python compiler. But I wouldn&#8221;t mind getting surprised at some point. So far, Kay has certainly shown a remarkable investment and was pretty successful.</p>
<p>It was nice to see in a couple of presentations that the kind of things that the company I currently work for is doing in Java is done in Python in other places. For example, an internal department at SAP is developping a Web based client infrastructure for SAP systems in Python, including a transparent object-to-SAP mapper (similar to ORMs), offline caching mechanisms, etc. From the presentation, it sounded very much like this could be useful for talking to SAP in general, not only for web clients. And it may become open source at some point.</p>
<p>Another feel-alike talk was about PyTAF, a graphical application integration framework for financial applications that is being developped in-house at LBBW in Stuttgart. It aims to do more or less the same as the code we write in Java, but has a graphical frontend for putting together integration flows. And, it&#8217;&#8217;s Python, which is a serious advantage for this kind of software. It even uses lxml.objectify internally for data processing - best choice ever! <img src='/strato-data/Weblog25//wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>It may well be that next year&#8217;&#8217;s PyCon-DE will take place at the same location. It worked so well that there&#8217;&#8217;s no reason for a change. Although Berlin would also be a great location&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=188</wfw:commentRss>
		</item>
		<item>
		<title>Fix URL display in Firefox 7</title>
		<link>http://blog.behnel.de/index.php?p=185</link>
		<comments>http://blog.behnel.de/index.php?p=185#comments</comments>
		<pubDate>Wed, 12 Oct 2011 18:18:40 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Software]]></category>

		<category><![CDATA[english]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=185</guid>
		<description><![CDATA[Firefox 7 comes with a very annoying &#8220;feature&#8221; that breaks copying from the URL bar by stripping away the protocol prefix from the URL. Here is how to fix it. The magic option in &#8220;about:config&#8221; is called &#8220;browser.urlbar.trimURLs&#8221;. Switch it off and Firefox starts working again.
]]></description>
			<content:encoded><![CDATA[<p>Firefox 7 comes with a very annoying &#8220;feature&#8221; that breaks copying from the URL bar by stripping away the protocol prefix from the URL. <a href="http://www.ghacks.net/2011/09/28/firefox-add-http-back-to-address-bar/">Here</a> is how to fix it. The magic option in &#8220;about:config&#8221; is called &#8220;browser.urlbar.trimURLs&#8221;. Switch it off and Firefox starts working again.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=185</wfw:commentRss>
		</item>
		<item>
		<title>PyCon-DE 2011</title>
		<link>http://blog.behnel.de/index.php?p=180</link>
		<comments>http://blog.behnel.de/index.php?p=180#comments</comments>
		<pubDate>Fri, 01 Jul 2011 06:35:28 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[deutsch]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=180</guid>
		<description><![CDATA[Die PyCon-DE wird die erste, große Python Konferenz in Deutschland. Na endlich. Und: es werden noch Vorträge angenommen!
]]></description>
			<content:encoded><![CDATA[<p>Die <a href="http://de.pycon.org/">PyCon-DE</a> wird die erste, große Python Konferenz in Deutschland. Na endlich. Und: es werden noch <a href="http://de.pycon.org/2011/speaker/">Vorträge</a> angenommen!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=180</wfw:commentRss>
		</item>
		<item>
		<title>Cython close to hand-crafted C code for generators</title>
		<link>http://blog.behnel.de/index.php?p=163</link>
		<comments>http://blog.behnel.de/index.php?p=163#comments</comments>
		<pubDate>Wed, 06 Apr 2011 07:05:38 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Cython]]></category>

		<category><![CDATA[english]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=163</guid>
		<description><![CDATA[I did a couple of experiments compiling itertools with the new generator support in Cython. In CPython, the itertools module is actually written in hand tuned C and does very little computation in its generators, so I knew it would be hard to reach with generated code. But Cython does a pretty good job.
Something as [...]]]></description>
			<content:encoded><![CDATA[<p>I did a couple of experiments compiling <a href="http://docs.python.org/library/itertools.html">itertools</a> with the new generator support in <a href="http://cython.org/">Cython</a>. In CPython, the itertools module is actually written in hand tuned C and does very little computation in its generators, so I knew it would be hard to reach with generated code. But Cython does a pretty good job.</p>
<p>Something as trivial as <code>chain()</code> is exactly as fast as in the C implementation, but compared to the more than 60 lines of C code, it is certainly a lot more readable in Cython:</p>
<pre>
def chain(*iterables):
    """Make an iterator that returns elements from the first iterable
    until it is exhausted, then proceeds to the next iterable, until
    all of the iterables are exhausted. Used for treating consecutive
    sequences as a single sequence.
    """
    for it in iterables:
        for element in it:
            yield element
</pre>
<p>Other functions, like <code>islice()</code>, are faster in C, partly because CPython actually takes a couple of shortcuts, e.g. by only looking up the iterator slot method once. You cannot do that in Python code, and I wanted to keep the implementation compatible with regular Python. Specifically, the C speed advantage for <code>islice()</code> is currently about 30-50% in general, although the Cython implementation can also be up to 10% faster for some cases, e.g. when extracting only a couple of items from the middle of a longer sequence. The C implementation is about 90 lines, here is the Cython implementation:</p>
<pre>
import sys
import cython

# Python 2/3 compatibility
_max_size = cython.declare(cython.Py_ssize_t,
        getattr(sys, "maxsize", getattr(sys, "maxint", None)))

@cython.locals(i=cython.Py_ssize_t, nexti=cython.Py_ssize_t,
               start=cython.Py_ssize_t, stop=cython.Py_ssize_t, step=cython.Py_ssize_t)
def islice(iterable, *args):
    """Make an iterator that returns selected elements from the
    iterable. If start is non-zero, then elements from the iterable
    are skipped until start is reached. Afterward, elements are
    returned consecutively unless step is set higher than one which
    results in items being skipped. If stop is None, then iteration
    continues until the iterator is exhausted, if at all; otherwise,
    it stops at the specified position. Unlike regular slicing,
    islice() does not support negative values for start, stop, or
    step. Can be used to extract related fields from data where the
    internal structure has been flattened (for example, a multi-line
    report may list a name field on every third line).
    """
    s = slice(*args)
    start = s.start or 0
    stop = s.stop or _max_size
    step = s.step or 1
    if start < 0:
        raise ValueError("...")
    if step < 1:
        raise ValueError("...")
    if start >= stop:
        return
    nexti = start
    for i, element in enumerate(iterable):
        if i == nexti:
            yield element
            nexti += step
            if nexti >= stop or nexti < 0:
                return
</pre>
<p>Here is one that is conceptually quite simple: <code>count()</code>. I had to optimise it quite a bit, because the iteration code in the C code is extremely tight. Even the tuned version below runs about 10% slower than the hand tuned C version, which is about 230 lines long.</p>
<pre>
@cython.locals(i=cython.Py_ssize_t)
def count(n=0):
    """Make an iterator that returns consecutive integers starting
    with n. If not specified n defaults to zero. Often used as an argument to imap()
    to generate consecutive data points. Also, used with zip() to add
    sequence numbers.
    """
    try:
        i = n
    except OverflowError:
        i = _max_size # skip i-loop
    else:
        n = _max_size # first value after i-loop
    while i < _max_size:
        yield i
        i += 1
    while True:
        yield n
        n += 1
</pre>
<p>Note that all of the above generators execute in the order of microseconds, so even a slow-down of 50% will likely not be measurable in real world code.</p>
<p>So far, I did not try any of the more fancy functions in itertools (those that actually <em>do</em> something). The Cython project has announced a <a href="http://wiki.cython.org/GSoC2011">Google Summer of Code project</a> with exactly the intent to rewrite some of the C stdlib modules of CPython in pure Python code with Cython compiler hints. So I leave this exercise to interested readers for now.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=163</wfw:commentRss>
		</item>
		<item>
		<title>A faster Python difflib</title>
		<link>http://blog.behnel.de/index.php?p=155</link>
		<comments>http://blog.behnel.de/index.php?p=155#comments</comments>
		<pubDate>Mon, 21 Mar 2011 19:00:40 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Cython]]></category>

		<category><![CDATA[Python]]></category>

		<category><![CDATA[english]]></category>

		<guid isPermaLink="false">http://blog.behnel.de/index.php?p=155</guid>
		<description><![CDATA[As part of a push for making it easier to develop faster and more readable core/stdlib code for the CPython runtime, I have written a short patch against the difflib module in the Python standard library to make it a) compile with Cython and b) run faster as a compiled binary module. The net result [...]]]></description>
			<content:encoded><![CDATA[<p>As part of a <a href="http://thread.gmane.org/gmane.comp.python.devel/122273/focus=122716">push for making it easier to develop faster and more readable core/stdlib code</a> for the <a href="http://python.org/">CPython</a> runtime, I have written a short <a href="http://consulting.behnel.de/difflib.cython.patch">patch</a> against the <a href="http://docs.python.org/library/difflib.html">difflib</a> module in the Python standard library to make it a) compile with <a href="http://cython.org/">Cython</a> and b) run faster as a compiled binary module. The net result is that it runs more than 50% faster with only the minor code modifications provided in the patch, and still about as fast in the normal CPython interpreter.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=155</wfw:commentRss>
		</item>
		<item>
		<title>no mod_deflate? use mod_rewrite!</title>
		<link>http://blog.behnel.de/index.php?p=125</link>
		<comments>http://blog.behnel.de/index.php?p=125#comments</comments>
		<pubDate>Tue, 22 Feb 2011 12:17:08 +0000</pubDate>
		<dc:creator>Stefan Behnel</dc:creator>
		
		<category><![CDATA[Software]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://behnel.de/cgi-bin/weblog_basic/index.php?p=125</guid>
		<description><![CDATA[Sadly, the shared hosting service for my web site does not support mod_deflate in its Apache installation. There are various resources on the web that deal with this in one way or another, but they all talk about the drawbacks of shutting out certain clients or presenting errror pages to some of them. Well, if [...]]]></description>
			<content:encoded><![CDATA[<p>Sadly, the shared hosting service for my web site does not support <a href="http://httpd.apache.org/docs/current/mod/mod_deflate.html">mod_deflate</a> in its Apache installation. There are various resources on the web <a href="http://everything2.com/title/How+to+get+Apache+to+send+compressed+versions+of+static+HTML+files">that deal with this</a> in one way or another, but they all talk about the drawbacks of shutting out certain clients or presenting errror pages to some of them. Well, if you cannot get on-the-fly compression, there is at least the drawback of having to compress all your pages statically before hand, thus providing duplicate files for each. But that is actually a good thing - it serves even faster that way because it does not have to compress anything while it is serving the content. So you put in a little more work on site updates and trade space for speed where it matters.</p>
<p>Here is a simple way to configure <a href="http://httpd.apache.org/docs/current/mod/mod_rewrite.html">mod_rewrite</a> to serve the compressed pages even if you do not have mod_deflate available.</p>
<p>First, compress all your HTML pages, e.g. using</p>
<pre>
find -name "*.html" | while read file; do gzip -9c &lt; $file &gt; $file.gz; done
</pre>
<p>Since I am using <a href="http://www.gnu.org/software/make/">make</a> to handle my web site, here is an extract that helps me keeping all compressed files updated when I upload the pages (also in parallel, when I pass e.g. &#8220;-j 5&#8243;):</p>
<pre>
TEXT_FILES=$(shell find -name "*.html" -o -name "*.css" -o -name "*.js")

.PHONY: copy

copy: $(addsuffix .gz, $(TEXT_FILES))
	copy_website.sh

%.gz: %
	gzip -9c $&lt; &gt; $@
</pre>
<p>Now we can configure mod_rewrite to prefer these compressed files for clients that support it. To do this, I put the following into my .htaccess file:</p>
<pre>
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteCond %{HTTP:Accept-Encoding}   .*gzip.*
RewriteRule ^(.*[.])(html|js|css)$        $1$2.gz      [L]
</pre>
<p>To spell this out:</p>
<ol>
<li>check if there really is a compressed version of the requested file available (&#8221;-f&#8221; tests for the path being a file).</li>
<li>check if the client tells us that it accepts &#8220;gzip&#8221; compressed content</li>
<li>if both conditions hold, redirect the client to the compressed file version.</li>
</ol>
<p>Given that the compressed files are often 5x smaller than the plain HTML version, this saves lots of bandwidth from my web site with really little effort.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.behnel.de/index.php?feed=rss2&amp;p=125</wfw:commentRss>
		</item>
	</channel>
</rss>

