Dabeaz

Dave Beazley's mondo computer blog. [ homepage ]

Tuesday, January 05, 2010

 

The Python GIL Visualized

In preparation for my upcoming PyCON'2010 talk on "Understanding the Python GIL", I've been working on a variety of new material--including some graphical visualization of the GIL behavior described in my earlier talk. I'm still experimenting, but check it out.

In these graphs, Python interpreter ticks are shown along the X-axis. The two bars indicate two different threads that are executing. White regions indicate times at which a thread is completely idle. Green regions indicate when a thread holds the GIL and is running. Red regions indicate when a thread has been scheduled by the operating system only to awake and find that the GIL is not available (e.g., the infamous "GIL Battle"). For those who don't want to read, here is the legend again in pictures:


Okay, now let's look at some threads. First, here is the behavior of running two CPU-bound threads on a single CPU system. As you will observe, the threads nicely alternate with each other after long periods of computation.

Now, let's go fire up the code on your fancy new dual-core laptop. Yow! Look at all of that GIL contention. Again, all of those red regions indicate times where the operating system has scheduled a Python thread on one of the cores, but it can't run because the thread on the other core is holding it.

Here's an interesting case that involves an I/O bound thread competing with a CPU-bound thread. In this example, the I/O thread merely echoes UDP packets. Here is the code for that thread.

def thread_1(port):
    s = socket(AF_INET,SOCK_DGRAM)
    s.bind(("",port))
    while True:
        msg, addr = s.recvfrom(1024)
        s.sendto(msg,addr)

The other thread (thread 2) is just mindlessly spinning. This graph shows what happens when you send a UDP message to thread 1.

As you would expect, most of the time is spent running the CPU-bound thread. However, when I/O is received, there is a flurry of activity that takes place in the I/O thread. Let's zoom in on that region and see what's happening.

In this graph, you're seeing how difficult it is for the I/O bound to get the GIL in order to perform its small amount of processing. For instance, approximately 17000 interpreter ticks pass between the arrival of the UDP message and successful return of the s.recvfrom() call (and notice all of the GIL contention). More that 34000 ticks pass between the execution of s.sendto() and looping back to the next s.recvfrom() call. Needless to say, this is not the behavior you usually want for I/O bound processing.

Anyways, that is all for now. Come to my PyCON talk to see much more. Also check out Antoine Pitrou's work on a new GIL.

Note: It is not too late to sign up for my Concurrency Workshop next week (Jan 14-15).


Comments:
Man that is cool as hell (your pictures I mean) not the GIL exactly. :)
 
I would be interested to see the same profiling done with the new GIL in 3.2 - or maybe you're saving it for your talk ;)
 
I definitely want to put some new GIL stuff in the talk. Just haven't gotten around to putting it under the microscope yet.
 
Is it easy to create those graphs? I'd love to be able to do that for my own code.
 
So are you just whipping these up by hand or do you have code that draws them?
 
You should indicate which of the graphs are (a) made up, (b) taken from actual code, or (c) taken from real-world code.

In the UDP example, what's the next performance impact? If a thread's execution is delayed, but it executes soon enough to prevent its consumers from stalling, there's zero net impact on the application. A smarter GIL would have no benefit.
 
Nice work, please post everything on your blog after the talk.
 
Now this would make a highly interesting profiling tool. Highly interesting I tell you.
 
Regarding some of the comments.

First, to understand these graphs, you have to go view my GIL Talk from the summer as well as the followup blog post that describes how Python was instrumented to obtain data. It should be stressed that this was all done with a custom modified version of Python 2.6 instrumented in a way to log GIL events.

Due to the very large amount of GIL trace data that gets generated, the graphs were produced by a custom script that I wrote specifically for that purpose. It takes the GIL event sequence for each thread and color codes it directly into a large PNG file that is created with PyPNG. I'm still exploring other options, but this actually worked pretty well.
 
what about multiple threads, all IO bound.
 
Multiple I/O bound threads work great in Python--you can have hundreds or even thousands of them. However, it is critical that they all stay I/O bound. If one of them suddenly wakes up and decides it wants to parse a huge XML file or something, then all bets are off.

In any case, it's important to emphasize that all of this GIL stuff is mostly about the problem of Python needlessly burning useless CPU cycles on other CPU cores when there is GIL contention. I would claim that fixing that is probably a good idea.
 
Which OS is this? The email you linked mentions that OS X handles lock contention more poorly than, e.g., Linux.
 
All of the graphs are for the same datasets I used when making my June, 2009 GIL Talk---which was all done on OS-X Leopard.
 
What are the vertical bars visible about half-way along in the top row of the first chart?
 
I'm in the middle of watching your
"Mindblowing Python GIL" video, and I'm struck that you are using the Mac.

OS-X has some known bugs in the interaction between signals and pthreads (ie OS-X doesn't support pthread_sigmask correctly).

http://lists.apple.com/archives/darwin-userlevel/2001/Dec/msg00008.html

The consequence of this OS-X issue can be *crazy* if you happen to work on software which tries to control which thread will do signal handling. I've spent *way* more time that I'd like to think about debugging this exact problem.
 
A few followup comments.

Ralph, you have a keen eye. There are in fact some vertical bars on the 1-processor graph shown. Those bars actually represent times when the thread was awakened, but it was unsuccessful in acquiring the GIL (just like in the lower graph). This happens much less often with 1 CPU, but it doesn't go away completely. Just to give you an idea of how this can occur, when one of the threads has expired its time-slice (in the operating system), the OS kernel will perform a context switch. However, this context switch invariably occurs while the thread is still holding the GIL. So, the system has to briefly switch back to the other thread in order to run it to a point where the GIL gets released.

TJK,

Although I did my work on OS-X, the phenomena described is not specific to OS-X. I've had participants in my advanced python course run these same thread tests on different machines and you get the same behavior on Linux and Windows to varying degrees of severity.
 
It is very interesting that they see it on Linux and Windows.

Thanks!
 
Post a Comment





<< Home

Archives

08/01/2009 - 09/01/2009   09/01/2009 - 10/01/2009   10/01/2009 - 11/01/2009   11/01/2009 - 12/01/2009   12/01/2009 - 01/01/2010   01/01/2010 - 02/01/2010   02/01/2010 - 03/01/2010   04/01/2010 - 05/01/2010  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]