Python Master Classes - In Chicago

with David Beazley
Author of the "Python Essential Reference"
Dabeaz LLC
5412 N Clark Street #218
Chicago, IL 60640
Follow dabeazllc on Twitter

Target Audience:

This course is for more experienced Python programmers. Attendees are expected to already be familiar with the core Python language and common library modules such as os, sys, etc. Having some basic knowledge of network programming principles is recommended.

Next Course Date:

  • To be announced

Price: $1795

What's Included?

  • A printed copy of the course notes.
  • A copy of the "Python Essential Reference, 4th Ed."
  • Breakfast and lunch at local restaurants
  • Snacks
[ More Information | FAQ]

Python Networking, Concurrency, and Distributed Systems

[4.5 days] Extend your Python knowledge by learning how to write networked and distributed programs. Topics include socket programming, internet data handling (XML, JSON, etc.), simple web programming, WSGI, REST, actors, remote procedure call (RPC), message passing, map-reduce, distributed objects, and asynchronous I/O. The course also includes in-depth material on concurrent programming techniques including threads, processes, and multiprocessing. A major focus of this course is on the underlying principles that form the foundation of the programming frameworks and applications that you may be using now--you will walk away with new insight and ideas for improving your code.

Given the advanced nature of the material, this course will also expand your knowledge of advanced Python programming language features including decorators, context managers, and more.

More About This Course

This is a one-of-a-kind course that you will simply not find anywhere else. In fact, this might be the ultimate computer systems programming course that you wish you had taken in college. For instance, one minute, you might be talking about an advanced Python feature whereas the next minute, you might be discussing how it interacts with the operating system kernel. Past participants have described the course as being "intense."

Initially, this course started as the infamous "Python Concurrency Workshop" from 2009--the same workshop that blew the covers off of the Python GIL. The course has since expanded to include more information on network programming as well as an ever-expanding look at distributed computing techniques. If you have been looking for a Python course where you get to spend an entire week immersed in diabolical systems hacking, then you won't be disappointed.

Detailed Course Outline

Day 1

  1. Network Fundamentals and Socket Programming. An introduction to some basic concepts of network programming. Covers the essential details of TCP/IP and programming with sockets. Students will learn how to write both TCP and UDP based clients and servers.
  2. Client-side programming. A look at high-level library modules that allow Python to connect to standard Internet and web-related services (e.g., HTTP, FTP, XML-RPC, etc.). Special attention will be given to the urllib2 module that allows Python to interact with web servers.
  3. Internet Data Handling. A brief overview of library modules that are used to process common Internet data formats such as HTML, XML, and JSON.
  4. Web Programming. The absolute basics of web programming in Python. Topics include CGI scripting, the WSGI interface, and implementing custom HTTP servers. Note: This section is primarily focused on how to put a web-based interface on low-level services as might be encountered in a distributed computing environment. It does not cover web frameworks or the problem of using Python to build a website.
  5. Server Design. An introduction to common server design techniques with a focus on handling concurrent clients. Topics include process forking, threading, and event-handling. Much of this material serves as an introduction to more advanced topics that follow.

Day 2

  1. Thread Programming. Everything you wanted to know about Python threads, but were afraid to ask. Includes the absolute basics of using the threading module and different techniques for using threads to carry out work. Includes detailed coverage of using different synchronization primitives, queues, and thread pools. Also provided detailed information on the Global Interpreter Lock (GIL), tuning parameters, and the interaction between threads and C/C++ extension modules.
  2. Multiprocessing. A tour of features provided by the multiprocessing library added in Python 2.6. Covers processes, queues, pipes, process pools, and shared memory regions. Examples will illustrate how multiprocessing can be used to achieve higher performance when working on multiple CPU cores.

Day 3

  1. Message Passing and Data Serialization. Message passing is a core component of distributed computation. This section provides an in-depth look at different interprocess communication mechanisms, their performance characteristics, and tuning options. In addition, different approaches for serializing Python data structures are explored. Topics include the subprocess module, named pipes, network sockets, memory mapped regions, pickle, marshal, structure packing, and binary I/O. The section concludes with information on high-level messaging systems such as ZeroMQ and AMQP.
  2. Distributed Programming. An in-depth tour of different distributed programming techniques. Topics include programming with actors, client-server computing, REST, remote procedure call, map-reduce, and distributed objects. Also includes material on XML-RPC and WSGI.

Day 4

  1. Advanced I/O handling. A look at different I/O handling techniques including blocking, non-blocking, asynchronous, and event-driven I/O. The primary goal of this section is to better understand the I/O handling using by different libraries and frameworks such as asyncore, Twisted, etc.

  2. Generators and Coroutines. An overview of concurrent programming using generators and coroutines. The major focus of this section is on using generators to implement user-level task switching and to better understand libraries based on microthreads, tasklets, green-threads, and similarly named entities.

Day 5

  1. Practicum. The final half day will consist of an advanced programming project designed to reinforce concepts covered in the course.

Course Materials

Students will receive a bound fully indexed set of lecture notes along with a complete set class exercises (distributed electronically). All class exercises come with solution code for later study and for use during the class.

Slide Topic Index

The following table, generated automatically from the presentation slides, gives much more detail about the material to be covered. I am always making improvements and additions to course material so this outline is subject to change at any time.

1. Network Fundamentals

Network Fundamentals1-1
The Problem1-2
Two Main Issues1-3
Network Addressing1-4
Standard Ports1-5
Using netstat1-6
Client/Server Concept1-8
Request/Response Cycle1-9
Using Telnet1-10
Data Transport1-11
Socket Basics1-13
Socket Types1-14
Using a Socket1-15
TCP Client1-16
Server Implementation1-18
TCP Server1-19
Advanced Sockets1-28
Partial Reads/Writes1-29
Sending All Data1-31
End of Data1-32
Data Reassembly1-33
Non-blocking Sockets1-35
Socket Options1-36
Sockets as Files1-37
Odds and Ends1-40
UDP : Datagrams1-41
UDP Server1-42
UDP Client1-43
Unix Domain Sockets1-44
Raw Sockets1-45
Sockets and Concurrency1-46
Threaded Server1-50
Forking Server (Unix)1-51
Asynchronous Server1-52
Utility Functions1-53

2. Client Programming

Client Programming2-1
urllib Module2-3
urllib protocols2-5
HTML Forms2-6
Web Services2-8
Parameter Encoding2-9
Sending Parameters2-10
Response Data2-12
Response Headers2-13
Response Status2-14
urllib Limitations2-16
urllib2 Module2-17
urllib2 Example2-18
urllib2 Requests2-19
Requests with Data2-20
Request Headers2-21
urllib2 Error Handling2-22
urllib2 Openers2-23
urllib2 build_opener()2-24
Example : Login Cookies2-25
Upload to a FTP Server2-30

3. Internet Data Handling

Internet Data Handling3-1
CSV Files3-3
Parsing HTML3-5
Running a Parser3-7
HTML Example3-8
XML Parsing with SAX3-10
Brief XML Refresher3-11
Brief Review : XML Sample3-12
SAX Parsing3-13
XML and ElementTree3-17
Brief Review : etree Parsing3-18
Obtaining Elements3-19
Iterating over Elements3-20
Element Attributes3-21
Search Wildcards3-22
XML Namespaces3-24
Tree Modification3-27
Tree Output3-28
Incremental Parsing3-29
XML Commentary3-33
Third Party Modules3-34
Sample JSON File3-38
Processing JSON Data3-39

4. Web Programming

Web Programming Basics4-1
HTTP Explained4-5
HTTP Client Requests4-6
HTTP Responses4-7
HTTP Protocol4-8
Content Encoding4-9
Payload Packaging4-10
Role of Python4-12
Typical Python Tasks4-13
Content Generation4-14
Example : Page Templates4-15
HTTP Servers4-19
A Simple Web Server4-20
A Web Server with CGI4-22
CGI Scripting4-23
CGI Example4-24
CGI Mechanics4-27
Classic CGI Interface4-28
CGI Query Variables4-29
cgi Module4-30
CGI Responses4-31
Note on Status Codes4-32
CGI Commentary4-33
WSGI Interface4-36
WSGI Example4-37
WSGI Applications4-38
WSGI Environment4-39
Processing WSGI Inputs4-41
WSGI Responses4-42
WSGI Content4-44
WSGI Content Encoding4-45
WSGI Deployment4-46
WSGI and CGI4-48
WSGI Deployment4-49
Customized HTTP4-51
Web Frameworks4-55

5. Advanced Networking

Advanced Networking5-1
Problem with Sockets5-3
SocketServer Example5-5
Execution Model5-11
Design Discussion5-13
Big Picture5-14
Concurrent Servers5-15
Server Mixin Classes5-16
Server Subclassing5-17
Distributed Computing5-19
Simple XML-RPC5-22
XML-RPC Commentary5-24
XML-RPC and Binary5-25
Serializing Python Objects5-27
pickle Module5-28
Pickling to Strings5-29
Pickle and Large Objects5-32
Miscellaneous Comments5-33
Connection Use5-37
What about...5-42
Network Wrap-up5-43

6. Concurrency Introduction

My Personal Interest6-5
Basic Concepts6-6
Concurrent Programming6-7
Parallel Processing6-9
Task Execution6-10
CPU Bound Tasks6-11
I/O Bound Tasks6-12
Shared Memory6-13
Processes 6-14
Distributed Computing6-15
Why Python?6-16
Some Issues6-17
Why Use Python at All?6-18
Python as a Framework6-19
Programming Productivity6-20
Performance is Irrelevant6-21
You Can Go Faster6-22
Special Cases6-24
Let's Get Started6-25

7. Thread Programming

Python Multithreading7-1
Background : Threads7-3
Usage : Threads7-4
Concept: Threads7-5
Thread Basics7-6
threading Module7-11
Thread Objects7-12
Launching Thread Objects7-13
Thread Execution7-14
Joining a Thread7-15
Thread Status7-16
Interpreter Execution7-17
Daemonic Threads7-18
Killing Threads7-19
Thread Termination7-20
Returning Results7-22
The Problem7-23
Returning Results7-24
Result Objects7-25
Returning Results7-26
Returning Exceptions7-27
Results w/ Exceptions7-28
Returning Exceptions7-29
Threads and Memory7-31
Shared Memory7-32
Shared Objects7-33
Thread Local Data7-34
Debugging with Threads7-38
Setting the Thread Name7-39
Thread Logging7-40
Logging Information7-41
Accessing Shared Data7-44
Race Conditions7-48
Thread Synchronization7-49
Synchronization Options7-50
Mutex Locks7-52
Use of Mutex Locks7-54
Using a Mutex Lock7-55
Locking Perils7-58
Lock Management7-59
Locks and Deadlock7-61
Special Topic:7-63
with Statement7-64
Context Management7-65
Context Mgr: Locking7-67
Exception Handling7-68
Where to Put Locks?7-70
Locking Costs7-71
Contested Locking7-72
RLock Example7-74
Special Topic:7-76
A Decorator for Locking7-79
An Example7-81
Decorators with Arguments7-82
Semaphore Uses7-86
Resource Control7-87
Thread Signaling7-88
Barrier Synchronization7-92
Event Waiting7-93
Condition Variables7-95
Threads and Queues7-101
Queue Library Module7-102
Queue Usage7-103
Queue Completion7-104
Queue Programming7-105
Example: Thread Pools7-106
An Inconvenient Truth7-108
A Performance Test7-109
Bizarre Results7-110
Threads Explained7-112
What is a Thread?7-113
The Infamous GIL7-114
GIL Behavior7-115
CPU Bound Processing7-116
The Check Interval7-117
The Periodic Check7-118
What is a "Tick?"7-119
Tick Execution7-120
CPU-Bound Threads7-121
Multicore GIL Contention7-122
GIL Contention Effect7-123
Multiple CPU Cores7-124
Is There A Fix?7-125
The GIL and C Code7-126
The GIL and C Extensions7-127
How to Release the GIL7-128
The GIL and C Extensions7-129
Some Lessons Learned7-133
Using Threads7-134
I/O Bound Processing7-135
Thread Limits7-137
Check Interval Tuning7-138
Thread Memory Use7-139
Thread Stack Space7-140
Final Comments7-141

8. Messaging and Data Serialization

Message Passing8-1
Concept: Message Passing8-2
Message Passing8-4
Problem Decomposition8-5
Example : Dataflow8-6
Example : Worker Pool8-8
Example : Map-Reduce8-10
Example : Decomposed Data8-11
Sending Messages8-14
A Problem8-15
Section Focus8-16
Some Tricky Bits8-17
Messaging Basics8-19
What is a Message?8-20
Message Transport8-21
An Example8-23
Named Pipes/FIFOs8-24
Pipe Performance8-25
Using Sockets8-27
A File Caution8-28
Memory Mapped Regions8-31
Data Overlay8-33
mmap Commentary8-34
Using mmap8-35
Message Encoding8-37
Object Serialization8-38
The Problem8-39
pickle Module8-40
Some Pickle Issues8-43
cPickle vs. Pickle8-44
Pickle Encodings8-46
Selecting a Protocol8-47
Pickling to Strings8-50
Pickling Instances8-51
Python and References8-53
Reference Example8-54
Pickle and References8-55
Preserving References8-57
Pickler Objects8-58
Pickler Caution8-59
Pickle and Large Objects8-60
Classes and Functions8-61
Miscellaneous Comments8-62
Customizing Pickle8-64
Simple Pickling8-65
Advanced Pickling8-66
Foreign Objects8-68
struct module8-69
Structure Alignment8-73
Packing Binary Records8-74
Unpacking Records8-75
Performance Tip8-76
struct Cautions8-77
Binary Arrays8-79
ByteArray Objects8-80
Direct Array Output8-81
Direct Array Input8-82
ctypes module8-84
ctypes Types8-85
ctypes module8-86
ctypes Caution8-88
buffer() function8-90
buffer() Function8-91
Using buffer()8-92
Recognizing Buffers8-95
Messaging Wrap up8-97

9. Multiprocessing

multiprocessing Module9-2
Multiprocessing Tour9-4
Functions as Processes9-6
multiprocessing Example9-7
Launching Processes9-8
Does it Work?9-9
Other Process Features9-10
Process Creation9-11
A Caution9-12
Distributed Memory9-15
Synchronization Primitives9-16
Message Queues9-17
Joinable Queues9-18
Queue Example9-19
Using Pipes9-24
Pipe Setup9-25
Pipe Example9-27
Pipes vs. Queues9-29
Process Pools9-31
Async Results9-35
Process Pools9-36
Using Process Pools9-38
Shared Data9-40
Shared Example9-41
Shared Arrays9-42
Locking Performance9-43
Lock-Free Sharing9-44
Wrap Up9-45

10. Distributed Programming

Distributed Programming10-1
A Problem10-3
Major Topics10-6
Actor Programming10-7
Features of Actors10-9
Actor History10-10
Actor Implementation10-12
Note on APIs10-13
An Example10-14
Another Example10-15
Using Actors10-16
A Problem10-18
Actor Runtime10-19
Async Messaging10-20
Implementing the Runtime10-22
Example: Threaded Actor10-23
Actors and Processes10-29
Process Issues10-30
Actor Addressing10-32
Actor Naming10-33
Names and Wrappers10-36
Name Registry10-38
Message Addressing10-40
Implementing send()10-41
Distributed Actors10-44
Connection Objects10-45
Distributed Send10-49
Proxy Actors10-50
Proxy Implementation10-51
Using a Proxy10-52
Message Dispatching10-53
Message Dispatcher10-54
Using the Dispatcher10-59
Putting it All Together10-60
Big Picture10-61
Tricky Bits With Actors10-63
What's in a Message?10-64
Instances and Messages10-65
Don't Do It!10-69
Actor Naming10-70
Name Registry10-71
Registry Details10-72
Concurrency Alternatives10-73
Optional Concurrency10-74
Concurrency Alternatives10-76
Hard Problems10-77
Actor Commentary10-79
Some Links10-80
Client/Server Computing10-81
Client/Server Comments10-83
RESTful Services10-84
REST Resources10-85
Resource Representation10-86
REST Actions10-87
REST Examples10-88
Stateless Implementation10-90
Reuse of HTTP10-91
Implementing REST10-92
Example with WSGI10-93
Running an WSGI App10-98
REST Links10-99
Remote Procedure Call10-101
Simple XML-RPC10-105
XML-RPC Commentary10-107
XML-RPC and Binary10-108
XML-RPC and Instances10-109
Some Issues10-110
RPC Libraries10-111
Problems with Objects10-113
Distributed Objects10-114
Server Instances10-116
Server Dispatching10-117
Client Proxies10-118
Object Registry10-119
Proxy Creation/Lookup10-120
Various Problems10-122
Object Managers10-123
Using a Manager10-125
Manager Example10-126
Hard Problems10-132
Some Resources10-133
Final Comments10-135

11. Advanced I/O

Advanced I/O Handling11-1
I/O Basics11-5
Blocking I/O11-6
Blocking I/O Rules11-9
Caution : Partial Sends11-10
Socket Tuning Parameters11-11
Non-blocking I/O11-13
Non-blocking Sockets11-14
Using Non-blocking I/O11-15
Overlapped I/O11-16
Overlapped I/O Example11-17
Asynchronous I/O11-20
I/O Polling/Multiplexing11-23
select module11-24
select() function11-25
select() performance11-28
A select() limitation11-30
Event Driven I/O11-32
Event Driven "Tasks"11-35
Example : Time Server11-39
A Complication11-42
Example : Echo Server11-44
Events and Asyncore11-50
Using Asyncore11-51
Twisted Example11-53
Event Driven Problems11-54
Scaling Problems11-55
Some Solutions11-57
A Scaling Benefit11-58
Long-Running Calculations11-59
Blocking Operations11-60
The Blocking Problem11-61
Incremental Feeding11-63
Events and Threads11-64
Interoperability Problems11-66
Personal Bias11-67

12. Generators and Coroutines

Generators and Coroutines12-1
Reference Material12-3
Background Material12-4
Generator Functions12-7
A Practical Example12-9
Generators as Pipelines12-10
A Pipeline Example12-11
Yield as an Expression12-12
Coroutine Execution12-14
Coroutine Priming12-15
Using a Decorator12-16
Processing Pipelines12-17
An Example12-18
Generators as Tasks12-21
Program Execution12-22
Task Switching12-23
An Insight12-24
Multitasking Example12-25
Scheduling Example12-26
Yielding For I/O12-29
More About Yield12-30
Talking to the Scheduler12-31
An Example Task12-32
Signaling an I/O Request12-33
Implementing I/O Waits12-35
Building a Scheduler12-36
Example : Time Server12-44
Example : Echo Server12-45
Some Links12-52
More Information12-53
Copyright (C) 2010, Dabeaz LLC. All Rights Reserved.