From 2012-2017, I wrote a regular Python column for Usenix ;login:. This page contains a list of links to the articles, organized by year.
It’s with some regret that this is the final installment of my regular Python column. I suppose that I should offer some final words of wisdom, a historical retrospective, or maybe even a forward-looking “Python of the future” vision—however, I’m simply not that clever. Truth be told, I’m simply stretched a bit thin these days and think it would be a good time to step aside to make room for a fresh new voice and insights. So, in the spirit of leaving on a useful note, I thought I’d end on a practical trifle of a matter of some importance—the problem of getting a Python program to quit.
A small confession: when writing code, I don’t usually write tests first. There, I’ve said it. Hate me. I suspect I’m not alone among Python developers. Yes, yes, testing is important, and for my major projects, tests still get written. However, for a lot of small things like little scripts, utilities, and personal projects, I just don’t bother because I don’t want to think about all of the extra steps and tooling that’s usually involved. However, a recent conference experience may have changed some of my views. In this installment, I discuss a more lightweight approach to testing along with a brief introduction to some third-party testing libraries, including pytest and Hypothesis.
Over the last few years Python has changed substantially, introducing a variety of new language syntax and libraries. While certain features have received more of the limelight (e.g., asynchronous I/O), an easily overlooked aspect of Python is its revamped handling of file names and directories. I introduced some of this when I wrote about the pathlib module in ;login: back in October 2014. Since writing that, however, I’ve been unable to bring myself to use this new feature of the library. It was simply too different, and it didn’t play nicely with others. Apparently, I wasn’t alone in finding it strange--pathlib was almost removed from the standard library before being rescued in Python 3.6. Given that three years have passed, maybe it’s time to revisit the topic of file and directory handling.
Recently someone asked me when I thought that Python 2 and Python 3 might converge. They were a bit dismayed when I replied “never.” If anything, Python 3 is moving farther and farther away from Python 2 at an accelerating pace. As I write this, Python 3.6 is just days from being released. It is filled with all sorts of interesting new features that you might want to use if you dare. Of course, you’ll have to give up compatibility with all prior versions if you do. That said, maybe an upgrade is still worth it for your personal projects. In this article, I look at a few of the more interesting new additions. A full list of changes can be found in the “What’s New in Python 3.6” document.
Much maligned and misunderstood, metaclasses might be one of Python’s most useful features. On the surface, it might not be clear why this would be the case. Just the name “metaclass” alone is enough to conjure up an image of swooping manta rays and stinging bats attacking your coworkers in a code review. I’m sure that there are also some downsides, but metaclasses really are a pretty useful thing to know about for all sorts of problems of practical interest to systems programmers. This includes simplifying the specification of network protocols, parsers, and more. In this installment, we’ll explore the practical side of metaclasses and making Python do some things you never thought possible. Note: This article assumes the use of Python 3.
Network programming has been a part of Python since its earliest days. Not only are there a wide variety of standard library modules ranging from low-level socket programming to various aspects of Web programming, there are a large number of third-party packages that simplify various tasks or provide different kinds of I/O models. As the network evolves and new standards emerge, though, one can’t help but wonder whether the existing set of networking libraries are up to the task. For example, if you start to look at technologies such as websockets, HTTP/2, or coroutines, the whole picture starts to get rather fuzzy. Is there any room for innovation in this space? Or must one be resigned to legacy approaches from its past. In this article, I spend some time exploring some projects that are at the edge of Python networking. This includes my own Curio project as well as some protocol implementation libraries, including hyper-h2 and h11.
Most of the Python code that I write isn’t part of an exotic framework or huge application. Instead, it’s usually related to a mundane data analysis task involving a CSV file. It isn’t glamorous, but Python is an effective tool at getting the job done without too much fuss. When working on such problems, I prefer to not worry too much about low-level details (I just want the final answer). However, if you use Python for manipulating a lot of data, you may find that your scripts use a large amount of memory. In this article, I’m going to peek under the covers of how memory gets used in a Python program and explore options for using it more efficiently. I’ll also look at some techniques for exploring and measuring the memory use of your programs. Disclosure: Python 3 is assumed for all of the examples, but the underlying principles apply equally to Python 2.
The addition of improved support for asynchronous I/O in Python 3 is one of the most significant changes to the Python language since its inception. However, it also balkanizes the language and libraries into synchronous and asynchronous factions—neither of which particularly like to interact with the other. Needless to say, this presents an interesting challenge for developers writing a program involving I/O. In this article, I explore the problem of working in an environment of competing I/O models and whether or not they can be bridged in some way. As a warning, just about everything in this article is quite possibly a bad idea. Think of it as a thought experiment.
Python 3.5 was released to the world on September 13, 2015. Included in this release was a substantial upgrade to asynchronous I/O support in the asyncio module, including brand new syntax in the Python language itself. In this article, we dive into modern asyncio and take it for a test drive on code involving low-level socket programming, high-level sockets, HTTP clients, and HTTP servers. Prepare yourself for the unexpected—this is not the Python you know. Just to be clear, you’ll need Python 3.5 or newer to try the examples.
As you read this, Python 3.5 should be hitting the streets with a wide assortment of new features and even some new syntax. "New syntax?" you ask. Why yes. Even though Python has been around for more than 25 years now, it continues to evolve and sprout surprising new features from time to time. In this month’s installment, I’m going to look at a seemingly minor part of Python that turns out to be fairly useful—the use of * and ** in function arguments, function argument passing, and data handling.
In the previous installment, we dived into some of the low-level details and problems related to Python threads. As a brief recap, although Python threads are real system threads, there is a global interpreter lock (GIL) that restricts their execution to a single CPU core. Moreover, if your program performs any kind of CPU-intensive processing, the GIL can impose a severe degradation in the responsiveness of other threads that happen to be performing I/O.
Talk to any Python programmer long enough and eventually the topic of concurrent programming will arise—usually followed by some groans, some incoherent mumbling about the dreaded global interpreter lock (GIL), and a request to change the topic. Yet Python continues to be used in a lot of applications that require concurrent operation whether it is a small Web service or full-fledged application. To support concurrency, Python provides both support for threads and coroutines. However, there is often a lot of confusion surrounding both topics. So in the next two installments, we’re going to peel back the covers and take a look at the differences and similarities in the two approaches, with an emphasis on their low-level interaction with the system. The goal is simply to better understand how things work in order to make informed decisions about larger libraries and frameworks.
One of my favorite Python topics to talk about is error handling—specifically, how to use and not use exceptions. Error handling is hard and tricky. Error handling can mean the difference between an application that can be debugged and one that can’t. Error handling can blow your business up in the middle of the night if you aren’t careful. So, yes, how you handle errors is important. In this article, I’ll dig into some of the details of exceptions, some surefire techniques for shooting yourself in the foot, and some ways to avoid it.
A common complaint levied against Python (and other similar languages) is the dynamic nature of its type handling. Dynamic typing makes it difficult to optimize performance because code can’t be compiled in the same way that it is in languages like C or Java. The lack of explicitly stated types can also make it difficult to figure out how the parts of a large application might fit together if you’re simply looking at them in isolation. This difficulty also applies, to tools that might analyze or try to check your program for correctness.
I’m not sure I’ve ever seen “gravitas” used as a kind of software metric. However, if there were such a thing, I think it could probably be measured by the number of predefined constants required to carry out any sort of task. For example, directly programming with sockets and setting socket options has a high degree of gravitas. Simply specifying a port number to a Web framework—not so much. Other examples might include programming OpenGL vs. turtle graphics. Or maybe just about anything involving OpenSSL. Extra bonus points are earned if such constants can get together in an unholy bitmask such as O_RDWR | O_CREAT. Yes, constants. Gravitas.
If you’re like me, you’ve probably written a Python script or two that had to manipulate pathnames. For that, you’ve probably used the much beloved os.path module—and perhaps the glob module. And let’s not forget some of their friends such as fnmatch, shutil, subprocess, and various bits of functionality in os. Aw, let’s face it, who are we kidding here? Pathname handling in Python is an inexplicable mess, has always been a mess, and will always continue to be a mess. Or will it?
If I look back, an overwhelming number of the Python programs I have written have been simple scripts and tools meant to be executed from the command line. Sure, I’ve created the occasional Web application, but the command line has always been where the real action is. However, I also have a confession—despite my reliance on the command line, I just can’t bring myself to use any of Python’s built-in libraries for processing command line options. Should I use the getopt module? Nope. Not for me. What about optparse or argparse? Bah! Get out!
March 2014 saw the release of Python 3.4. One of its most notable additions is the inclusion of the new asyncio module to the standard library. The asyncio module is the result of about 18 months of effort, largely spearheaded by the creator of Python, Guido van Rossum, who introduced it during his keynote talk at the PyCon 2013 conference. However, ideas concerning asynchronous I/O have been floating around the Python world for much longer than that. In this article, I’ll give a bit of historical perspective as well as some examples of using the new library.
Believe it or not, it’s been more than five years since Python 3 was unleashed on the world. At the time of release, common consensus among Python core developers was that it would probably take about five years for there to be any significant adoption of Python 3. Now that the time has passed, usage of Python 3 still remains low. Does the continued dominance of Python 2 represent a failure on the part of Python 3? Should porting existing code to Python 3 be a priority for anyone? Does the slow adoption of Python 3 reflect a failure on the part of the Python developers or community? Is it something that you should be worried about?
If you’ve ever had the pleasure of installing third-party Python
packages, you already know that it can be a bit of a mess. There are a
variety of different tools, file formats, and other packaging
complications. Frankly, it’s enough to make your head spin.
As Python programmers know, Python doesn’t really have a notion of a main() function like compiled languages such as C or Java. That is, there’s no dedicated function that you define as the entry point to your program. Instead, there is the concept of a “main” program module. The “main” module holds the contents of whatever file you tell Python to execute.
At the last PyCon conference, Raymond Hettinger gave a keynote talk in which he noted that context managers might be one of Python’s most powerful yet underappreciated features. Seeing how something so minor could be one of the language’s most powerful features as claimed might be a bit of a stretch. In this article, we’ll take a look at some examples involving context managers and see that so much more is possible.
One of the software projects that I maintain is the PLY parser generator. In a nutshell, PLY is a Python implementation of the classic lex and yacc tools used for writing parsers, compilers, and other related programs. It’s also not the kind of program that tends to change often—to be sure, I’m not aware of any sort of space-race concerning the implementation of LALR(1) parser generators (although perhaps there’s some startup company Lalrly.com just waiting to strike parsing gold).... trust me, the rest of this is about dictionaries.
As I begin to write this, I’m returning on the plane from PyCon 2013, held March 13-17 in Santa Clara, California. When I started using Python some 16 years ago, the Python conference was an intimate affair involving around a hundred people. This year’s conference featured more than 2,500 attendees and 167 sponsors—bigger than ever for an event that’s still organized by the community (full disclaimer, I was also one of the sponsors). If you couldn’t attend, video and slidedecks for virtually every talk and tutorial can be found online at http://pyvideo.org and https://speakerdeck.com/pyconslides.
For the past eight months, I’ve been locked away in my office working on a new edition of the Python Cookbook (O’Reilly & Associates). One of the benefits of writing a book is that you’re forced to go look at a lot of stuff, including topics that you think you already know. For me, the Cookbook was certainly no exception. Even though I’ve been using Python for a long time, I encountered a lot of new tricks that I had never seen before. Some of these were obviously brand new things just released, but many were features I had just never noticed even though they’ve been available in Python for many years.
In the August 2012 issue of ;login:, I explored some of the inner workings of Python’s import statement. Much of that article explored the mechanism that’s used to set up the module search path found in sys.path as well as the structure of a typical Python installation. At the end of that article, I promised that there is even more going on with import than meets the eye. So, without further delay, that’s the topic of this month’s article.
In most of my past work, I’ve always had a need to solve various sorts of data analysis problems. Prior to discovering Python, AWK and other assorted UNIX commands were my tools of choice. These days, I’ll mostly just code up a simple Python script (e.g., see the June 2012 ;login: article on using the collections module). Lately though, I’ve been watching the growth of the Pandas library with considerable interest.
One of the most significant additions to Python’s standard library in recent years is the inclusion of the multiprocessing library. First introduced in Python 2.6, multiprocessing is often pitched as an alternative to programming with threads. For example, you can launch separate Python interpreters in a subprocess, interact with them using pipes and queues, and write programs that work around issues such as Python’s Global Interpreter Lock, which limits the execution of Python threads to a single CPU core.
Although multiprocessing has been around for many years, I needed some time to wrap my brain around how to use it effectively. Surprisingly, I have found my own use differs from those often provided in examples and tutorials. In fact, some of my favorite features of this library tend not to be covered at all.
In this column, I decided to dig into some lesser-known aspects of using the multiprocessing module.
I have a small confession: I often think that I don’t fully understand the Python import statement. As a seasoned Python veteran, it might be surprising to admit something like that, but I think I might be in good company—most of the Python programmers I meet are just as mystified by some of the inner workings of it as I am.
As a Python programmer, you know that lists, sets, and dictionaries are useful for collecting data. Using just these three primitives, you can build just about any other data structure in the known universe However, why would you? In this article, we reach into Python’s collections library and look at some of the tools it provides for manipulating collections of data. If you’re like me, these will quickly become a part of your day-to-day programming.
As Python programmers know, there has always been a “batteries included” philosophy when it comes to Python’s standard library. For instance, if you simply download Python and install it, you instantly get access to hundreds of modules ranging from XML parsing to reading and writing WAV files.
The standard library is both a blessing and a curse. Because of it, many programmers find they can simply install Python and have it work well enough for their purposes. At the same time, reliance on the library and concerns about backwards compatibility tend to give it a certain amount of inertia. It is sometimes difficult to push for changes and improvements to existing modules. This is especially true if one tries to challenge the dominance of standard library modules for extremely common tasks such as regular expression parsing or network programming.
In this article, I’m going to take a brief tour through two third-party libraries, requests and regex, that have generated a bit of buzz in the Python world by aiming to replace long-standing and widely used standard library modules. Both have generated buzz in the Python world and, coincidentally, both start with the letter “R”.
It’s hard to believe, but last December marked the three-year anniversary of the first release of Python 3. Judging by the glacial rate of adoption, you might hardly notice. Honestly, most Python programmers (including most of Python’s core developers) are still using various versions of Python 2 for “real work.” Nevertheless, Python 3 development continues to move along as the language is improved and more and more libraries start to support it. Although you might look at the adoption rate with some dismay, it has always been known by Python insiders that it could take five years or more for a significant number of users to make the switch. Thus, we’re still in the middle of that transition.
Pssst... you should come to Chicago and take a class. Yes, you.