[ PYTHON NETWORKS, CONCURRENCY, AND DISTRIBUTED SYSTEMS ]
A one-of-a-kind course that is focused on using Python to build
distributed systems. Topics include socket programming, internet data handling
(XML, JSON, etc.), simple web programming, WSGI, REST, actors, remote
procedure call (RPC), message passing, map-reduce,
distributed objects, and asynchronous I/O. The course also
includes material on concurrent programming techniques including threads, processes,
and multiprocessing. A major focus of this course is on the underlying
principles that form the foundation of the programming frameworks and applications that you may
be using now--you will walk away with new insight and ideas for improving your code.
This course is also offered on an on-going basis in Chicago.
Syllabus
- Network Fundamentals and Socket Programming.
An introduction to some basic concepts of network programming. Covers
the essential details of TCP/IP and programming with sockets. Students
will learn how to write both TCP and UDP based clients and servers.
- Client-side programming. A look at high-level library modules
that allow Python to connect to standard Internet and web-related services (e.g.,
HTTP, FTP, XML-RPC, etc.). Special attention
will be given to the urllib2 module that allows Python to interact
with web servers.
- Internet Data Handling.
A brief overview of library modules that are used to process common Internet data
formats such as HTML, XML, and JSON.
- Web Programming. The absolute basics of web programming in Python.
Topics include CGI scripting, the WSGI interface, and implementing custom HTTP servers.
Note: This section is primarily focused on how to put a web-based interface on low-level
services as might be encountered in a distributed computing environment. It does not
cover web frameworks or the problem of using Python to build a website.
- Thread Programming. Everything you wanted to know about
Python threads, but were afraid to ask. Includes the absolute basics
of using the threading module and different techniques for
using threads to carry out work. Includes detailed coverage of using
different synchronization primitives, queues, and thread pools. Also
provided detailed information on the Global Interpreter Lock (GIL),
tuning parameters, and the interaction between threads and C/C++
extension modules.
- Multiprocessing. A tour of features provided by the
multiprocessing library added in Python 2.6. Covers processes,
queues, pipes, process pools, and shared memory regions. Examples
will illustrate how multiprocessing can be used to achieve higher
performance when working on multiple CPU cores.
- Message Passing and Data Serialization. Message
passing is a core component of distributed computation. This
section provides an in-depth look at different interprocess
communication mechanisms, their performance characteristics,
and tuning options. In addition, different approaches for
serializing Python data structures are explored. Topics include
the subprocess module, named pipes, network sockets, memory mapped regions,
pickle, marshal, structure packing, and binary I/O. The section
concludes with information on high-level messaging systems such as
ZeroMQ and AMQP.
- Distributed Programming. An in-depth tour of different
distributed programming techniques. Topics include programming with
actors, client-server computing, REST, remote procedure call,
map-reduce, and distributed objects. Also includes material on
XML-RPC and WSGI.
- Advanced I/O handling. A look at different I/O handling
techniques including blocking, non-blocking, asynchronous, and event-driven
I/O. The primary goal of this section is to better understand the
I/O handling using by different libraries and frameworks such as asyncore, Twisted, etc.
- Generators and Coroutines. An overview of concurrent
programming using generators and coroutines. The major focus of
this section is on using generators to implement user-level task switching
and to better understand libraries based on microthreads, tasklets, green-threads,
and similarly named entities.
Instruction Format
The course is designed to be taught on a 9-5 schedule with a one hour
lunch break. This course consists of both lecture slides and hands-on programming
exercises, with most of the time spent programming. Participants should plan
on spending 4-5 hours each day working on exercises.
Prerequisites
This course assumes a working knowledge of Python programming. Students should already know
know to write and debug programs and be familiar with core language features such as
functions, classes, modules, and the most commonly used modules in the standard library.
Given that the topics in this course are heavily focused on networking and systems programming,
students are well-advised to have some prior background working with processes, threads,
and network programming.
About the Instructor
All courses are taught by David Beazley, author of the Python
Essential Reference and nominated member of the Python Software
Foundation. David has been an active member of the Python
community since 1996 and is the creator of several Python-related
packages including SWIG and PLY. From 1990-1997,
he worked part time at Los Alamos National Laboratory where he helped
pioneer the use of Python on massively parallel supercomputers. From
1998-2005, he was an assistant professor in the department of computer
science at the University of Chicago where he taught courses in
operating systems, networks, and compilers. In addition to his work
with Python, Dave has extensive experience with C, C++, and assembly
language programming. Dave has a Ph.D. in computer science and a
M.S. in mathematics. An academic CV is available upon request.
Logistics
The class is best suited for 10 or fewer students. A larger class size is
possible, but due to the advanced nature of the material it should not
exceed 16 students.
You are responsible for providing the instruction space, a video
projector, and machines where students can work on the programming
exercises. The course can be taught on Windows, Linux, or Mac OS-X.
However, all machines must be equipped with the latest version of
Python (currently Python 2.6) and may required a small set of
third-party libraries.
2013 Schedule and pricing
Classes are normally scheduled at least 8 weeks in
advance. However, classes in the Chicago area can often be scheduled
on shorter notice depending on availability.
The cost of Python Networks, Concurrency, and Distributed Systems
with up to 10 students is $15000. This is an all-inclusive price that
includes instructor travel expenses. Additional students can be added
for $1200/student.
Contact
For more information, you can contact me by sending email to "dave" at "dabeaz.com".