WAD: A Module for Converting Fatal Extension Errors into Python Exceptions

David M. Beazley
Department of Computer Science
University of Chicago
Chicago, IL 60637
beazley@cs.uchicago.edu

Abstract

One of the more popular uses of Python is as an extension language for applications written in compiled languages such as C, C++, and Fortran. Unfortunately, one of the biggest drawbacks of this approach is the lack of a useful debugging and error handling facility for identifying problems in extension code. In part, this limitation is due to the fact that Python does not know anything about the internal implementation of an extension module. A more difficult problem is that compiled extensions sometimes fail with catastrophic errors such as memory access violations, failed assertions, and floating point exceptions. These types of errors fall outside the realm of normal Python exception handling and are particularly difficult to identify and debug. Although traditional debuggers can find the location of a fatal error, they are unable to report the context in which such an error has occurred with respect to a Python script. This paper describes an experimental system that converts fatal extension errors into Python exceptions. In particular, a dynamically loadable module, WAD (Wrapped Application Debugger), has been developed which catches fatal errors, unwinds the call stack, and generates Python exceptions with debugging information. WAD requires no modifications to Python, works with all extension modules, and introduces no performance overhead. An initial implementation of the system is currently available for Sun SPARC Solaris and i386-Linux.

1. Introduction

One of the primary reasons C, C++, and Fortran programmers are attracted to Python is its ability to serve as an extension language for compiled programs. Furthermore, tools such as SIP, CXX, Pyfort, FPIG, and SWIG make it extremely easy for a programmer to ``wrap'' existing software into an extension module [1,2,3,4,5]. Although this approach is extremely attractive in terms of providing a highly usable and flexible environment for users, extension modules suffer from problems not normally associated with Python scripts---especially when they don't work.

Normally, Python programming errors result in an exception like this:

% python foo.py
Traceback (innermost last):
  File "foo.py", line 11, in ?
    foo()
  File "foo.py", line 8, in foo
    bar()
  File "foo.py", line 5, in bar
    spam()
  File "foo.py", line 2, in spam
    doh()
NameError: doh
%

Unfortunately for compiled extensions, the following situation sometimes occurs:

% python foo.py
Segmentation Fault (core dumped)
%

Needless to say, this isn't very informative--well, other than indicating that something ``very bad'' happened.

In order to identify the source of a fatal error, a programmer can run a debugger on the Python executable or on a core file like this:

% gdb /usr/local/bin/python
(gdb) run foo.py
Starting program: /usr/local/bin/python foo.py

Program received signal SIGSEGV, Segmentation fault.
0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
(gdb) where
#0  0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
#1  0xff082f34 in _wrap_doh ()
   from /u0/beazley/Projects/WAD/Python/./dohmodule.so
#2  0x2777c in call_builtin (func=0x1984b8, arg=0x1a1ccc, kw=0x0)
    at ceval.c:2650
#3  0x27648 in PyEval_CallObjectWithKeywords (func=0x1984b8, arg=0x1a1ccc,
    kw=0x0) at ceval.c:2618
#4  0x25d18 in eval_code2 (co=0x19acf8, globals=0x0, locals=0x1c7844,
    args=0x1984b8, argcount=1625472, kws=0x0, kwcount=0, defs=0x0, defcount=0,
    owner=0x0) at ceval.c:1951
#5  0x25954 in eval_code2 (co=0x199620, globals=0x0, locals=0x1984b8,
    args=0x196654, argcount=1862720, kws=0x197788, kwcount=0, defs=0x0,
#6  0x25954 in eval_code2 (co=0x19ad38, globals=0x0, locals=0x196654,
    args=0x1962fc, argcount=1862800, kws=0x198e90, kwcount=0, defs=0x0,
    defcount=0, owner=0x0) at ceval.c:1850
#7  0x25954 in eval_code2 (co=0x1b6c60, globals=0x0, locals=0x1962fc,
    args=0x1a1eb4, argcount=1862920, kws=0x0, kwcount=0, defs=0x0, defcount=0,
    owner=0x0) at ceval.c:1850
#8  0x22da4 in PyEval_EvalCode (co=0x1b6c60, globals=0x1962c4, locals=0x1962c4)
    at ceval.c:319
#9  0x3adb4 in run_node (n=0x18abf8, filename=0x1b6c60 "", globals=0x1962c4,
    locals=0x1962c4) at pythonrun.c:886
#10 0x3ad64 in run_err_node (n=0x18abf8, filename=0x1b6c60 "",
    globals=0x1962c4, locals=0x1962c4) at pythonrun.c:874
#11 0x3ad38 in PyRun_FileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
    start=1616888, globals=0x1962c4, locals=0x1962c4, closeit=1)
    at pythonrun.c:866
#12 0x3a1d8 in PyRun_SimpleFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
    closeit=1) at pythonrun.c:579
#13 0x39d84 in PyRun_AnyFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
    closeit=1) at pythonrun.c:459
#14 0x1f498 in Py_Main (argc=2, argv=0xffbefc84) at main.c:289
#15 0x1eec0 in main (argc=2, argv=0xffbefc84) at python.c:10

Unfortunately, even though the debugger identifies the location where the fault occurred, it mostly provides information about the internals of the interpreter. The debugger certainly doesn't reveal anything about the Python program that led to the error (i.e., it doesn't reveal the same information that would be contained in a Python traceback). As a result, the debugger is of limited use when it comes to debugging an application that consists of both compiled and Python code.

Normally, extension developers try to avoid catastrophic errors by adding error handling. If an application is small or customized for use with Python, it can be modified to raise Python exceptions. Automated tools such as SWIG can also convert C++ exceptions and C-related error handling mechanisms into Python exceptions. However, no matter how much error checking is added, there is always a chance that an extension will fail in an unexpected manner. This is especially true for large applications that have been wrapped into an extension module. In addition, certain types of errors such as floating point exceptions (e.g., division by zero) are especially difficult to find and eliminate. Finally, rigorous error checking may be omitted to improve performance.

To address these problems, an experimental module known as WAD (Wrapped Application Debugger) has been developed. WAD is able to convert fatal errors into Python exceptions that include information from the call stack as well as debugging information. By turning such errors into Python exceptions, fatal errors now result in a traceback that crosses the boundary between Python code and compiled extension code. This makes it much easier to identify and correct extension-related programming errors. WAD requires no modifications to Python and is compatible with all extension modules. However, it is also highly platform specific and currently only runs on Sun Sparc Solaris and i386-Linux. The primary goal of this paper is to motivate the problem and to describe one possible solution. In addition, many of the implementation issues associated with providing an integrated error reporting mechanism are described.

2. An Example

WAD can either be imported as a Python extension module or linked to an extension module. To illustrate, consider the earlier example:

% python foo.py
Segmentation Fault (core dumped)
%

To identify the problem, a programmer can run Python interactively and import WAD as follows:

% python
Python 2.0 (#1, Oct 27 2000, 14:34:45) 
[GCC 2.95.2 19991024 (release)] on sunos5
Type "copyright", "credits" or "license" for more information.
>>> import libwadpy
WAD Enabled
>>> execfile("foo.py")
Traceback (most recent call last):
  File "", line 1, in ?
  File "foo.py", line 16, in ?
    foo()
  File "foo.py", line 13, in foo
    bar()
  File "foo.py", line 10, in bar
    spam()
  File "foo.py", line 7, in spam
    doh.doh(a,b,c)
SegFault: [ C stack trace ]

#2   0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0)
#1   0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8)
#0   0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28

/u0/beazley/Projects/WAD/Python/foo.c, line 28

    int doh(int a, int b, int *c) {
 =>   *c = a + b;
      return *c;
    }

>>>

In this case, we can see that the program has tried to assign a value to a NULL pointer (indicated by the value "c=0x0" in the last function call). Furthermore, we obtain a Python traceback that shows the entire sequence of functions leading to the problem. Finally, since control returned to the interpreter, it is possible to interactively inspect various aspects of the application or to continue with the computation (although this clearly depends on the severity of the error and the nature of the application).

In certain applications, it may be difficult to run Python interactively or to modify the code to explicitly import a special debugging module. In these cases, WAD can be attached to an extension module with the linker. For example:

% ld -G $(OBJS) -o dohmodule.so -lwadpy

This requires no recompilation of any source code--only a relinking of the extension module. When Python loads the relinked extension module, WAD is automatically initialized before Python invokes the module initialization function.

3. Design Considerations for Embedded Error Recovery

The primary design goal of WAD is provide an error reporting mechanism for extension modules that is a natural extension of normal Python exception handling. There are two primary motivations for handling fatal errors in this manner: first, in the context of Python programming, it is simply unnatural to run a separate debugging application to identify a problem in an extension module when no such requirement exists for scripts. Thus, an embedded error reporting mechanism is simply more convenient. Second, the target users of an extension module may not know how to use a debugger or even have a development environment installed on their machine. Therefore, the ability to produce an informative traceback within the confines of the Python interpreter can be of tremendous value to an extension developer. This is because users who report a problem will be able to include an informative traceback as opposed to simply saying ``the code crashed.''

A secondary design goal is to provide a system that is as non-invasive as possible. The system should not require modifications to Python or any extension modules and it should be easy to integrate into the runtime environment of an application. In addition, it shouldn't introduce any performance overhead.

Finally, since WAD co-exists with the Python interpreter (i.e., in the same process), there are a number of technical issues that have to be addressed. First, fatal errors can theoretically occur anywhere in the interpreter as well as in extension modules. Therefore, WAD needs to know about Python's internal organization if it is going to provide a graceful recovery back to the interpreter. Second, in order to implement this recovery scheme, the system has to perform direct manipulation of the CPU context and call stack. Last, but not least, since the recovery code lives in the same address space as the interpreter and extension modules it should not depend on the process stack and heap (since both could have been corrupted by the faulting application).

4. Catching Fatal Errors

WAD catches catastrophic errors by installing a reliable signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE [9]. Unlike the more familiar BSD-style signal interface (as provided by the Python signal module), reliable signal handlers are installed using the sigaction() system call and have a few notable properties:

The signal handler can be configured to run on its own dedicated stack.
Handler functions can receive a structure containing the CPU context including the CPU registers, program counter, and stack pointer.
Changes to the CPU context take effect immediately after the signal handler returns.

Therefore, the high level implementation of WAD is relatively straightforward: when a fatal signal occurs, a handler function runs on an isolated signal handling stack. The CPU context is then used to unwind the call stack and to inspect the process state. Finally, if possible, the CPU context is modified in a manner that allows the signal handler to return to Python with a raised exception.

5. A Detailed Description of the Recovery Mechanism

In this section, a more detailed description of the error recovery scheme is presented. The precise implementation details of this are highly platform specific and involve a number of advanced topics including the Unix process file system (/proc), the ELF object file format, and the Stabs compiler debugging format [6,7,8]. The details of these topics are beyond the scope of this paper. However, this section hopes to give the reader a small taste of the steps involved in implementing the recovery mechanism.

The services of WAD are only invoked upon the reception of a fatal signal. This triggers a signal handling function that results in a return to Python as illustrated in the following figure:

Control flow of the error recovery mechanism

The steps required to implement this recovery are as follows:

The values of the program counter and stack pointer are obtained from the CPU context structure passed to the WAD signal handler.

The virtual memory map of the process is inspected to identify all of the shared libraries, dynamically loaded modules, and valid memory regions. This information is obtained by reading from the Unix /proc filesystem. The following table illustrates the nature of this data:

Address     Size    Permissions        File
----------  -----   -----------------  ---------------------------------
00010000    1264K   read/exec         /usr/local/bin/python 
0015A000     184K   read/write/exec   /usr/local/bin/python 
00188000     296K   read/write/exec     [ heap ] 
FE7C0000      32K   read/exec         /u0/beazley/Projects/dohmodule.so 
FE7D6000       8K   read/write/exec   /u0/beazley/Projects/dohmodule.so 
...
FF100000     664K   read/exec         /usr/lib/libc.so.1 
FF1B6000      24K   read/write/exec   /usr/lib/libc.so.1 
FF1BC000       8K   read/write/exec   /usr/lib/libc.so.1 
FF2C0000     120K   read/exec         /usr/lib/libthread.so.1 
FF2EE000       8K   read/write/exec   /usr/lib/libthread.so.1 
FF2F0000      48K   read/write/exec   /usr/lib/libthread.so.1 
FF310000      40K   read/exec         /usr/lib/libsocket.so.1 
FF32A000       8K   read/write/exec   /usr/lib/libsocket.so.1 
FF330000      24K   read/exec         /usr/lib/libpthread.so.1 
FF346000       8K   read/write/exec   /usr/lib/libpthread.so.1 
FF350000       8K   read/write/exec    [ anon ] 
FF3B0000       8K   read/exec         /usr/lib/libdl.so.1 
FF3C0000     128K   read/exec         /usr/lib/ld.so.1 
FF3E0000       8K   read/write/exec   /usr/lib/ld.so.1 
FFBEA000      24K   read/write/exec    [ stack ]

The call stack is unwound to produce a traceback of the calling sequence that led to the error. The unwinding process is just a simple loop that is similar to the following:

long *pc = get_pc(context);
long *sp = get_sp(context);
while (sp) {
    /* Move to previous stack frame */
    pc = (long *) sp[15];      /* %i7 register on SPARC */
    sp = (long *) sp[14];      /* %i6 register on SPARC */
}

For each stack frame, symbol table and debugging information is gathered and stored in a WAD exception frame object. Obtaining this information is the most complicated part of WAD and involves the following steps: first, the current program counter is mapped to an object file using the virtual memory map obtained in step 2. Next, the object file is loaded using mmap(). Once loaded, the ELF symbol table is searched for an address match. The symbol table contains a collection of records containing memory offsets, sizes, and names such as this:
```
Offset    Size    Name 
--------  ------  ---------
0x1280    324     wrap_foo
0x1600    128     foo
0x2408    192     bar
...
```
To find a match for a virtual memory address addr, WAD simply searches for a symbol s such that base + s.offset <= addr < base + s.offset + s.size, where base is the base virtual address of the object file in the virtual memory map.
Debugging information, if available, is scanned to identify a source file, function name, and line number. This involves scanning object files for a table of debugging information stored in a format known as ``stabs.''. Stabs is a relatively simple, but highly extensible format that is language independent and capable of encoding almost every aspect of the original source code. For the purposes of WAD, only a small subset of this data is actually used.
The following table shows a small fragment of relevant stabs data:
```
type    desc   value        string                        description
------  -----  ---------    ---------------------------   -----------
0x64      0        0         /u0/beazley/Projects/foo/    Pathname
0x64      0        0        foo.c                         Filename
...                             
0x24      0        0        foo:F(0,3);(0,3)              Function
0xa0      4        68       n:p(0,3)                      Parameter
...                                                      
0x44      6        8                                      Line number
0x44      7        12                                     Line number
0x44      8        44                                     Line number
0x44      9        56                                     Line number
...                                                      
```
In the table, the type field indicates the type of debugging information. For example, 0x64 specifies the source file, 0x24 is a function definition, 0xa0 is a function parameter, and 0x44 is line number information. Associated with each stab is a collection of parameters and an optional string. The string usually contains symbol names and other information. The desc and value fields are numbers that usually contain byte offsets and line number data. Therefore, to collect debugging information, WAD simply walks through the debugging tables until it finds the function of interest. Once found, parameter and line number specifiers are inspected to determine the location and values of the function arguments as well the source line at which the error occurred.
After the complete traceback has been obtained, it is examined to see if there are any ``safe'' return points to which control can be returned. This is accomplished by maintaining an internal table of predefined symbolic return points as shown in the following table:
```
Python symbol                     Return value
-----------------------------     ------------------
call_builtin                      NULL
_PyImport_LoadDynamicModule       NULL
PyObject_Repr                     NULL
PyObject_Print                    -1
PyObject_CallFunction             NULL
PyObject_CallMethod               NULL
PyObject_CallObject               NULL
PyObject_Cmp                      -1
PyObject_Compare                  -1
PyObject_DelAttrString            -1
PyObject_DelItem                  -1
PyObject_GetAttrString            NULL
PyObject_GetItem                  NULL
PyObject_HasAttrString            -1
PyObject_Hash                     -1
PyObject_Length                   -1
PyObject_SetAttrString            -1
PyObject_SetItem                  -1
PyObject_Str                      NULL
PyObject_Type                     NULL
...
PyEval_EvalCode                   NULL
```
The symbols in this table correspond to functions within the Python interpreter that might execute extension code and include the parts of the interpreter that invoke builtin functions as well as the functions from the abstract object interface. If any of these symbols appear on the call stack, a handler function is invoked to raise a Python exception. This handler function is given a WAD-specific traceback object that contains a copy of the call stack and CPU registers as well as any symbolic and debugging information that was obtained. If none of the symbolic return points are encountered, WAD invokes a default handler that simply prints the full C stack trace and generates a core file.
If a return point is found, the CPU context is modified in a manner that allows the signal handler to return with a suitable Python error. This last step is the most tricky part of the recovery process, but the general idea is that CPU context is modified in a way that makes Python think that an extension function simply raised an exception and returned an error. Currently, this is implemented by having the signal handler return to a small handler function written in assembly language which arranges to return the desired value back to the specified return point.
The most complicated part of modifying the CPU context is that of restoring previously saved CPU registers. By manually unwinding the call stack, the WAD exception handler effectively performs the same operation as a longjmp() call in C. However, unlike longjmp(), no previously saved set of CPU registers are available from which to resume execution in the Python interpreter. The solution to this problem depends entirely on the underlying architecture. On the SPARC, register values are saved in register windows which WAD manually unwinds to restore the proper state. On the Intel, the solution is much more interesting. To restore the register values, WAD must manually inspect the machine instructions of each function on the call stack in order to find out where the registers might have been saved. This information is then used to restore the registers from their saved locations before returning to the Python interpreter.
Python receives the exception and produces a traceback.

6. Initialization and Loading

In the earlier example, it was shown that WAD could be both loaded as an extension module or simply attached to an existing module with the linker. This latter case is implemented by wrapping the WAD initialization function inside the constructor of a statically allocated C++ object like this:

class WadInit {
public:
    WadInit() {
        wad_init();   /* Call the real initialization function */
    }
};
static WadInit wad_initializer;

When the dynamic loader brings WAD into memory, it automatically executes the constructors of all statically allocated C++ objects. Therefore, this initialization code executes immediately after loading, but before Python actually calls the module initialization function. As a result, when an extension module is linked with WAD, the debugging capability is enabled before any other operations occur---this allows WAD to respond to fatal errors that might occur during module initialization. The rest of the initialization process consists of the following:

The WAD signal handler is installed.
A collection of return symbols are registered with the signal handler (see the previous section).
Four new Python exception objects SegFault, BusError, AbortError, and IllegalInstruction are added to the __builtin__ module.

Although the use of a C++ static constructor has the potential to conflict with C++ extension code that also uses static constructors, it is always possible to enable WAD prior to loading a C++ extension (e.g., WAD could be loaded separately).

7. Implementation Details

Currently, WAD is written in ANSI C with a small amount of C++, and a small amount of assembly code (to assist in the return to the interpreter). The entire implementation contains approximately 2000 semicolons and most of the code relates to the gathering of source code information (symbol tables, debugging information, etc.).

Although there are libraries such as GNU bfd that can assist with the reading of object files, none of these are used in the implementation [10]. First, these libraries tend to be quite large and are oriented more towards stand-alone tools such as debuggers, linkers, and compilers. Second, due to usual nature of the runtime environment and the restrictions on memory utilization (no heap, no stack), the behavior of these libraries is somewhat unclear and would require further study. Finally, given the small size of the prototype implementation, it didn't seem necessary to rely on a large general purpose library.

8. Discussion

The primary focus of this work is to provide a more useful error reporting mechanism to extension developers. However, this does not imply that WAD is appropriate as a general purpose exception handling mechanism. First, let's focus on the recovery mechanism:

When WAD unwinds the call stack, objects allocated on the stack are lost. This may interact poorly with C++ extensions since the unwinding process does not invoke C++ destructors. It may be possible to fix this problem, but doing so would require coordination with the C++ runtime library.
Similarly, if a procedure allocates objects on the heap, stack unwinding may cause those objects to never be reclaimed.
Closely related to heap management, stack unwinding may result in open files, sockets, and other system resources. Furthermore, in a multithreaded environment, deadlock may occur if a procedure is holding a lock when an error occurs.
An application may fail by overwriting the process heap and corrupting memory. Although WAD can produce internal diagnostics even when the heap has been destroyed, Python may fail immediately upon return from the WAD signal handler or shortly thereafter.
If an application destroys the call stack (via buffer overflow), WAD will be unable to complete a stack trace and will be unable to return to Python.
Memory management problems such as double-freeing of memory are particularly difficult to identify. If an extension module corrupts the memory allocator in some manner, this may cause Python to fail in a completely unexpected location. WAD is usually able to produce a traceback in this situation, but it may not correspond to the real source of the problem.

In addition, there are a number of issues that pertain to WAD's interaction with the Python interpreter:

The recovery mechanism is entirely based on symbolic information stored in the Python executable. Therefore, the return points are simply specified as strings such as ``call_builtin'' as opposed to real memory addresses. Because of this, WAD is compatible with essentially any version of Python (provided it supports class-based exceptions).
WAD is unable to manage multiple return values to same procedure. For example, Python's eval_code2() procedure contains a huge case statement for executing byte codes. Within this procedure, certain function calls return NULL to indicate an error and others return -1. Since WAD is unable to determine which value to return, this particular procedure does not make a very good return point for error recovery.
An alternative approach to the symbolic recovery scheme would be to instrument Python with a collection of safe return points using setjmp()/longjmp(). This approach is not used because it would require a significant number of changes to the interpreter and it would introduce an unacceptable amount of performance overhead.
WAD is generally safe to use with Python threads. However, if a compiled extension function manually releases the Python interpreter lock and subsequently faults, the return behavior is unspecified. In the future, it may be possible to use the interpreter lock to provide coordination between the interpreter and the error recovery mechanism.
Compiled extension code may perform an eval operation in which Python code is executed in the interpreter. This results in a situation where the complete call-stack of an application crosses the boundary between Python and C several times. WAD can still handle faults in this setting as long as an application is doing a reasonable amount of error checking. For example, a fatal error that occurs inside an eval operation could be caught by the extension code and propagated further up the call stack.
In certain cases, Python may be configured to handle the SIGFPE signal for floating point exceptions. The default Python handling of this error is to abort and dump core. However, with WAD, a complete stack traceback will be obtained when a SIGFPE occurs.
WAD is extremely inefficient. Due to restrictions on the heap and stack, WAD relies heavily on mmap() and a variety of other file operations as it handles errors. It also performs linear searches of symbol and debugging tables. As a result, WAD's generation of a Python exception is several orders of magnitude slower than an ordinary exception.

Finally, there are a number of application specific issues to note:

Aggressive compiler optimization techniques may prevent WAD from accurately reporting locations within the original source code. This is particularly problematic with numerical applications where techniques such procedure inlining can make it impossible to obtain accurate debugging information. Since these types of problems also arise in full-featured debuggers, it is unlikely that they can be easily fixed in WAD (at least not without a considerable amount of work).
If an application implements its own exception handling, it may provide Python with less information than what would obtained with WAD. For example, a programmer might implement a function like this:
```
void *Malloc(int size) {
   void *ptr;
   ptr = malloc(size);
   if (!ptr) throw("Out of memory");
   return ptr;
}
```
In this case, the ``throw'' function may initiate an internal exception handling mechanism that relies upon setjmp/longjmp or C++ exceptions. When the error eventually makes it back to the interpreter, the user will get an ``out of memory'' exception, but no additional information will be provided. In contrast, if the programmer simply used an assert() statement, WAD would produce a full stack trace leading to the error.

Despite its various limitations, WAD is applicable to a wide range of extension-related errors. Furthermore, most of the errors that are likely to occur are of a more benign variety. For example, a segmentation fault may simply be the result of an uninitialized pointer (perhaps the user forgot to call an initialization procedure). Likewise, bus errors, failed assertions, and floating point exceptions rarely result in a situation where the WAD recovery mechanism would be unable to produce a meaningful Python traceback.

9. Related Work

There is a huge body of literature concerning the implementation of exception handling in various programming languages and environments. A detailed discussion of this work is clearly not possible here, but a general overview of various exception handling issues can be found in [11]. In general, there are a few themes that seem to prevail. First, considerable attention has been given to exception handling mechanisms in specific languages such as efficient exception handling for C++. Second, a great deal of work has been given to the semantic aspects of exception handling such as exception hierarchies, finalization, and whether or not code is restartable after an exception has occurred. Finally, a fair amount of exception work has been done in the context of component frameworks and distributed systems. Most of this work tends to concentrate on explicit exception handling mechanisms. Very little work appears to have been done in the area of converting hardware generated errors into exceptions.

With respect to debuggers, quite a lot of work has been done in creating advanced debugging support for specific languages and integrated development environments. However, very little of this work has concentrated on the problem of extensible systems and compiled-interpreted language integration. For instance, debuggers for Python are currently unable to cross over into C extensions whereas C debuggers aren't able to easily extract useful information from the internals of the Python interpreter.

One system of possible interest is Rn which was developed in the mid-1980s at Rice University [12]. This system, primarily designed for working with large scientific applications written in Fortran, provided an execution monitor that consisted of a special debugging process with an embedded interpreter. When attached to compiled Fortran code, this monitor could dynamically patch the executable in a manner that allowed parts of the code to be executed in the interpreter. This was used to provide a debugging environment in which essentially any part of the compiled application could be modified at run-time by simply compiling the modified code (Fortran) to an interpreted form and inserting a breakpoint in the original executable that transferred control to the interpreter. Although this particular scheme is not directly related to the functionality of WAD, it is one of the few systems in which interpreted and compiled code have been tightly coupled within a debugging framework. Several aspects of the interpreted/compiled interface are closely related to way in which WAD operates. In addition, various aspects of this work may be useful should WAD be extended with new capabilities.

10. Future Directions

WAD is currently an experimental prototype. Although this paper has described its use with Python, the core of the system is generic and is easily extended to other programming environments. For example, when linked to C/C++ code, WAD will automatically produce stack traces for fatal errors. A module for generating Tcl exceptions has also been developed. Plans are underway to provide support for other extensible systems including Perl, Ruby, and Guile.

Finally, a number of extensions to the WAD approach may be possible. For example, even though the current implementation only returns a traceback string to the Python interpreter, the WAD signal handler actually generates a full traceback of the C call stack including all of the CPU registers and a copy of the stack data. Therefore, with a little work, it may be possible to implement a diagnostic tool that allows the state of the C stack to be inspected from the Python interpreter after a crash has occurred. Similarly, it may be possible to integrate the capabilities of WAD with those provided by the Python debugger.

11. Conclusions and Availability

WAD provides a simple mechanism for converting fatal errors into Python exceptions that provide useful information to extension writers. In doing so, it solves one of the most frustrating aspects of working with compiled Python extensions--that of identifying program errors. Furthermore the system requires no code modifications to Python and introduces no performance overhead. Although the system is necessarily platform specific, the system does not involve a significant amount of code. As a result, it may be relatively straightforward to port to other Unix systems.

As of this writing, WAD is still undergoing active development. However, the software is available for experimentation and download at at http://systems.cs.uchicago.edu/wad.

References

[1] D.M. Beazley, Using SWIG to Control, Prototype, and Debug C Programs with Python, 4th International Python Conference, Livermore, CA. (1996).

[2] P.F. Dubois, Climate Data Analysis Software, 8th International Python Conference, Arlington, VA. (2000).

[3] P.F. Dubois, A Facility for Creating Python Extensions in C++, 7th International Python Conference, Houston, TX. (1998).

[4] SIP. http://www.thekompany.com/projects/pykde/.

[5] FPIG. http://cens.ioc.ee/projects/f2py2e/.

[6] R. Faulkner and R. Gomes, The Process File System and Process Model in UNIX System V, USENIX Conference Proceedings, January 1991.

[7] J.R. Levine, Linkers & Loaders. Morgan Kaufmann Publishers, 2000.

[8] Free Software Foundation, The "stabs" debugging format. GNU info document.

[9] W. Richard Stevens, UNIX Network Programming: Interprocess Communication, Volume 2. PTR Prentice-Hall, 1998.

[10] S. Chamberlain. libbfd: The Binary File Descriptor Library. Cygnus Support, bfd version 3.0 edition, April 1991.

[11] M.L. Scott. Programming Languages Pragmatics. Morgan Kaufmann Publishers, 2000.

[12] A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, A Practical Environment for Scientific Programming. IEEE Computer, Vol 20, No. 11, (1987). p. 75-89.