Generator Tricks for Systems Programmers
Copyright (C) 2008
David M. Beazley
Presented at PyCon'08, March 13, 2008, Chicago, Illinois.
Essential Reference, 4th Edition is now available. If you like
this tutorial, you'll like this edition--there is extended coverage of generators, coroutines, and other advanced Python features.
(5/1/2009) I gave a followup tutorial "A Curious Course on Coroutines and Concurrency" which covers
coroutines and more advanced uses of generators.
(9/28/2008) I gave a slightly revised version of this tutorial at PyCON UK 2008 in Birmingham, UK. Frankly, that
version is a little better and more polished. Go here for the slides and exercises.
This tutorial discusses various techniques for using generator functions and generator expressions
in the context of systems programming. This topic loosely includes files, file systems, text parsing,
network programming, and programming with threads.
Support Data Files
The following file contains some supporting data files that are used by the various
code samples. Download this to your machine to work the examples that follow.
This download also includes all of the code samples that follow below.
Here are various code samples that are used in the course. You can
cut and paste these to your own machine to try them out. The order in
which these are listed follow the course outline. These examples are
written to run inside the "generators" directory that gets created
when you unzip the above file containing the support data.
Part 2 : Processing Data Files
Part 3 : Fun with Files and Directories
- nongenlog.py. Calculate the number of bytes
transferred in an Apache server log using a simple for-loop. Does not use generators.
- genlog.py. Calculate the number of bytes
transferred in an Apache server log using a series of generator expressions.
- makebig.py. Make a large access-log file for performance testing. This will create a file "big-access-log". For the numbers used in the
presentation, I used python makebig.py 2000.
Part 4 : Parsing and Processing Data
- genfind.py. A generator function that yields
filenames matching a given filename pattern.
- genopen.py. A generator function that yields
filenames matching a given filename pattern.
- gencat.py. A generator function that concatenates
a sequence of generators into a single sequence.
- gengrep.py. A generator that greps a series of lines for
those that match a regex pattern.
Part 5 : Processing Infinite Data
- bytesgen.py. Example that finds out how many bytes
were transferred for a specific file in a whole directory of log files.
- retuple.py. Parse a sequence of lines into a
sequence of tuples using regular expressions.
- redict.py. Parse a sequence of lines into a
sequence of dictionaries with named fields.
- fieldmap.py. Remap fields in a sequence of
- linesdir.py. Generate lines from files in a directory.
- apachelog.py. Parse an Apache log file.
- query404.py. Find the set of all documents that
are broken (404).
- largefiles.py. Find all requests that transferred over a megabyte.
- largest.py. Find the largest document.
- hosts.py. Find unique host IP addresses.
- downloads.py. Find number of downloads of a
- robots.py. Find out who has been hitting robots.txt.
Part 6 : Feeding the Pipeline
Part 7 : Extending the pipeline
- follow.py. Follow a log-file in real-time like tail -f in Unix. To run this program, you need to have a log-file to work with. Run the program runservers.py to start a
simulated web-server. This will write a series of log lines for you to follow.
- realtime404.py. Print all 404 requests as they
happen in real-time on a log file.
Part 8 : Various Programming Tricks (And Debugging)
- genpickle.py. Turn sequences of objects into a sequence of pickles.
- sendto.py. Send a sequence of items to a remote
machine via a socket. Uses genpickle above.
- receivefrom.py. Receive a sequence of
items from a socket. Uses genpickle above.
- broadcast.py. Broadcast a sequence of items
to a collection of consumers.
- netsend.py. Send items to another host on the network. Requires a receiver (use receivefrom.py above).
- genmulti.py. Generate items from more than one generator at once (multiplexing).
Part 9 : Co-routines
- gentrace.py. Example of debugging a generator component.
- storelast.py. Store the last value of a generator (for access later in the processing pipeline)
- genshutdown.py. Simple example of shutting down
- shutdownevt.py. Shutting down a generator with an event.