Using SWIG to Control, Prototype, and Debug C Programs with Python

(Submitted to the 4th International Python Conference)

David M. Beazley
Department of Computer Science
University of Utah
Salt Lake City, Utah 84112
beazley@cs.utah.edu

Abstract

I discuss early results in using SWIG to interact with C and C++ programs using Python. Examples of using SWIG are provided along with a discussion of the benefits and limitations of this approach. This paper describes work in progress with the hope of getting feedback and ideas from the Python community.

Note : SWIG is freely available at http://www.cs.utah.edu/~beazley/SWIG

Introduction

SWIG (Simplified Wrapper and Interface Generator) is a code development tool designed to make it easy for scientists and engineers to add scripting language interfaces to programs and libraries written in C and C++. The first version of SWIG was developed in July 1995 for use with very large scale scientific applications running on the Connection Machine 5 and Cray T3D at Los Alamos National Laboratory. Since that time, the system has been extended to support several interface languages including Python, Tcl, Perl4, Perl5, and Guile [1]. In this paper, I will describe how SWIG can be used to interact with C and C++ code from Python. I'll also discuss some applications and limitations of this approach. It is important to emphasize that SWIG was primarily designed for scientists and engineers who would like to have a nice interface, but who would rather work on more interesting problems than figuring out how to build a user interface or using a complicated interface generation tool.

Introducing SWIG

The idea behind SWIG is really quite simple---most interface languages such as Python provide some mechanism for making calls to functions written in C or C++. However, this almost always requires the user to write special "wrapper" functions that provide the glue between the scripting language and the C functions. As an example, if you wanted to add the getenv() function to Python, you would need to write the following wrapper code [4] :
static PyObject *wrap_getenv(PyObject *self, PyObject *args) {
         char * result;
         char * arg0;
         if(!PyArg_ParseTuple(args, "s",&arg0))
                 return NULL;
         result = getenv(arg0);
         return Py_BuildValue("s", result);
}

While writing a single wrapper function isn't too difficult, it quickly becomes tedious and error prone as the number of functions increases. SWIG automates this process by generating wrapper code from a list ANSI C function and variable declarations. As a result, adding scripting languages to C applications can become an almost trivial exercise.

The core of the SWIG consists of a YACC parser and a collection of general purpose utility functions. The output of the parser is sent to two different modules--a language module for writing wrapper functions and a documentation module for producing a simple reference guide. Each of the target languages and documentation methods is implemented as C++ class that can be plugged in to the system. Currently, Python, Tcl, Perl4, Perl5, and Guile are supported as target languages while documentation can be produced in ASCII, HTML, and LaTeX. Different target languages can be implemented as a new C++ class that is linked with a general purpose SWIG library file.

Previous work

Most Python users are probably aware that automatic wrapper generation is not a new idea. In particular, packages such as ILU are capable of providing bindings between C, C++, and Python using specifications given in IDL (Interface Definition Language) [3]. SWIG is not really meant to compete with this approach, but is designed to be a no-nonsense, easy to use tool that scientists and engineers can use to build interesting interfaces without having to worry about grungy programming details or learning a complicated interface specification language. With that said, SWIG is certainly not designed to turn Python into a C or C++ clone (I'm not sure that turning Python into interpreted C/C++ compiler would be a good idea anyways). However, SWIG can allow Python to interface with a surprisingly wide variety of C functions.

A SWIG example

While it is not my intent to provide a tutorial, I hope to illustrate how SWIG works by building a module from part of the UNIX socket library. While there is probably no need to do this in practice (since Python already has a socket module), it illustrates most of SWIG's capabilities on a moderately complex example and provides a basis for comparison between an existing module and one created by SWIG.

As input, SWIG takes an input file referred to as an "interface file." The file starts with a preamble containing the name of the module and a code block where special header files, additional C code, and other things can be added. After that, ANSI C function and variable declarations are listed in any order. Since SWIG uses C syntax, it's usually fairly easy to build an interface from scratch or simply by copying an existing header file. The following SWIG interface file will be used for our socket example :
// socket.i
// SWIG Interface file to play with some sockets
%init sock          // Name of our module
%{
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

/* Set some values in the sockaddr_in structure */
struct sockaddr *new_sockaddr_in(short family, unsigned long hostid, int port) {
        struct sockaddr_in *addr;
        addr = (struct sockaddr_in *) malloc(sizeof(struct sockaddr_in));
        bzero((char *) addr, sizeof(struct sockaddr_in));
        addr->sin_family = family;
        addr->sin_addr.s_addr = hostid;
        addr->sin_port = htons(port);
        return (struct sockaddr *) addr;
}
/* Get host address, but return as a string instead of hostent */
char *my_gethostbyname(char *hostname) {
        struct hostent *h;
        h = gethostbyname(hostname);
        if (h) return h->h_name;
        else return "";
}
%}

// Add these constants
enum {AF_UNIX, AF_INET, SOCK_STREAM, SOCK_DGRAM, SOCK_RAW,
      IPPROTO_UDP, IPPROTO_TCP, INADDR_ANY};

#define  SIZEOF_SOCKADDR  sizeof(struct sockaddr)

// Wrap these functions 
int socket(int family, int type, int protocol);
int bind(int sockfd, struct sockaddr *myaddr, int addrlen);
int connect(int sockfd, struct sockaddr *servaddr, int addrlen);
int listen(int sockfd, int backlog);
int accept(int sockfd, struct sockaddr *peer, %val int *addrlen);
int close(int fd);
struct sockaddr *new_sockaddr_in(short family, unsigned long, int port);
%name gethostbyname { char *my_gethostbyname(char *); }
unsigned long inet_addr(const char *ip);

%include unixio.i
There are several things to notice about this file The key thing to notice about SWIG interface files is that they support real C functions, constants, and datatypes including C pointers. As a general rule these files are also independent of the target scripting language--the primary reason why it's easy to support different languages.

Building a Python module

Building a Python module with SWIG is a simple process that proceeds as follows (assuming you're running Solaris 2.x) :
unix > wrap -python socket.i
unix > gcc -c socket_wrap.c -I/usr/local/include/Py
unix > ld -G socket_wrap.o -lsocket -lnsl -o sockmodule.so
The wrap command takes the interface file and produces a C file called socket_wrap.c . This file is then compiled and built into a shared object file.

A sample Python script

Our new socket module can be used normally within a Python script. For example, we could write a simple server process to echo all data received back to a client :
# Echo server program using SWIG module
from sock import *
PORT = 5000
sockfd = socket(AF_INET, SOCK_STREAM, 0)
if sockfd < 0:
        raise IOError, 'server : unable to open stream socket'
addr = new_sockaddr_in(AF_INET,INADDR_ANY,PORT)
if (bind(sockfd, addr, SIZEOF_SOCKADDR)) < 0 :
        raise IOError, 'server : unable to bind local address'
listen(sockfd,5)
client_addr = new_sockaddr_in(AF_INET, 0, 0)
newsockfd = accept(sockfd, client_addr, SIZEOF_SOCKADDR)
buffer = malloc(1024)
while 1:
        nbytes = read(newsockfd, buffer, 1024)
        if nbytes > 0 :
                write(newsockfd,buffer,nbytes)
        else : break
close(newsockfd)
close(sockfd)
In our Python module, we are manipulating a socket, in almost exactly the same way as would be done in a C program. We'll take a look at a client written using the Python socket library shortly.

Pointers and run-time type checking

SWIG imports C pointers as ASCII strings containing both the address and pointer type. Thus, a typical SWIG pointer might look like the following :

_fd2a0_struct_sockaddr_p

Some may view the idea of importing C pointers into Python as unsafe or even truly insane. However, handling pointers seems to be needed in order for SWIG to handle most interesting types of C code. To provide some safety, the wrapper functions generated by SWIG perform pointer-type checking (since importing C pointers into Python has in effect, bypassed type-checking in the C compiler). When incompatible pointer types are used, this is what happens :
unix > python
Python 1.3 (Apr 12 1996)  [GCC 2.5.8]
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from sock import *
>>> sockfd = socket(AF_INET, SOCK_STREAM,0);
>>> buffer = malloc(8192);
>>> bind(sockfd,buffer,SIZEOF_SOCKADDR);
Traceback (innermost last):
  File "", line 1, in ?
TypeError: Type error in argument 2 of bind. Expected _struct_sockaddr_p.
>>> 
SWIG's run-time type checking provides a certain degree of safety from using invalid parameters and making simple mistakes. Of course, any system supporting pointers can be abused, but SWIG's implementation has proven to be quite reliable under normal use.

Comparison with a real Python module

As a point of comparison, we can now write a client script using the Python socket module (this example courtesy of the Python library reference manual) [5].
# Echo client program
from socket import *
HOST = 'tjaze.lanl.gov'
PORT = 5000
s = socket(AF_INET, SOCK_STREAM)
s.connect(HOST,PORT)
s.send('Hello world')
data = s.recv(1024)
s.close()
print 'Received', `data`
As one might expect, the two scripts look fairly similar. However there are certain obvious differences. First, the SWIG generated module is clearly a quick and dirty approach. We must explicitly check for errors and create a few C data structures. Furthermore, while Python functions such as socket() can take optional arguments, SWIG generated functions can not (since the underlying C function can't). Of course, the apparent ugliness of the SWIG module compared to the Python equivalent is not the point here. The real idea to emphasize is that SWIG can take a set of real C functions with moderate complexity and produce a fully functional Python module in a very short period of time. In this case, about 10 minutes worth of effort. (I'm guessing that the Python socket module required more time than that). With a bit of work, the SWIG generated module could probably be used to build a more user-friendly version that hid many of the details.

Controlling C programs

Of course, the real goal of SWIG is not to produce quirky replacements for modules in the Python library. Instead, it better suited as a mechanism for controlling a variety of C programs.

In the SPaSM molecular dynamics code used at Los Alamos National Laboratory, SWIG is used to build an interface out of about 200 C functions [2]. This happens at compile time so most users don't notice that it has occurred. However, as a result, it is now extremely easy for the physicists using the code to extend it with new functions. Whenever new functionality is added, the new functions are put into a SWIG interface file and the functions become available when the code is recompiled (a process which usually only involves the new C functions and SWIG generated wrapper file). Of course, when running under Python, it is possible to build extensions and dynamically load them as needed.

Another point, not to be overlooked, is the fact that SWIG makes it very easy for a user to combine bits and pieces of completely different software packages without waiting for someone to write a special purpose module. For example, SWIG can be used to import the C API of Matlab 4.2 into Python. This combined with a simulation module can allow a scientist to perform both simulation and data analysis in interactively or from a script. One could even build a Tkinter interface to control the entire system. The ease of using SWIG makes it possible for scientists to build applications out of components that might not have been considered earlier.

Prototyping and Debugging

When debugging new modules or libraries, it would be nice to be able to test out the functions without having to repeatedly recompile the module with the rest of a large application. With SWIG, it is often possible to prototype and debug modules independently. SWIG can be used to build Python wrappers around the functions and scripts can be used for testing. In many cases, it is possible to create the C data structures and other information that would have been generated by the larger application all from within a Python script. Since SWIG requires no changes to the C code, integrating the new C code into the large package is usually pretty easy.

A related benefit of the SWIG approach is that it is now possible to easily experiment with large software libraries and packages. Peter-Pike Sloan at the University of Utah, used SWIG to wrap the entire contents of the Open-GL library (including more than 560 constants and 340 C functions). This ``interpreted'' Open-GL could then be used interactively to determine various image parameters and to draw simple figures. Is this case, it proved to be a remarkably effective way of figuring out the effects of different parameters and functions without having to recompile after every modification.

Living dangerously with datatypes

One of the reasons SWIG is easy to use for prototyping and control applications is its extremely forgiving treatment of datatypes. typedef can be used to map datatypes, but whenever SWIG encounters an unknown datatype, it simply assumes that it's some sort of complex datatype. As a result, it's possible to wrap some fairly complex C code with almost no effort. For example, one could create a module allowing Python to create a Tcl interpreter and execute Tcl commands as follows :
// tcl.i
// Wrap Tcl's C interface
%init tcl
%{
#include <tcl.h>
%}

Tcl_Interp *Tcl_CreateInterp(void);
void        Tcl_DeleteInterp(Tcl_Interp *interp);
int         Tcl_Eval(Tcl_Interp *interp, char *script);
int         Tcl_EvalFile(Tcl_Interp *interp, char *file);
SWIG has no idea what Tcl_Interp is, but it doesn't really care as long as it's used as a pointer. When used in a Python script, this module would work as follows :
unix > python
Python 1.3 (Mar 26 1996)  [GCC 2.7.0]
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import tcl
>>> interp = tcl.Tcl_CreateInterp();
>>> tcl.Tcl_Eval(interp,"for {set i 0} {$i < 5} {incr i 1} {puts $i}");
0
1
2
3
4
0
>>>

Limitations

This approach represents a balance of flexibility and ease of use. I have always felt that the tool shouldn't be more complicated than the original problem. Therefore, SWIG certainly won't do everything. Some of SWIG's limitations include it's limited understanding of complex datatypes. Often the user must write special functions to extract information from structures even though passing complex datatypes around by reference works remarkably well in practice. SWIG's support for C++ is also quite limited. At the moment SWIG can be used to pass pointers to C++ objects around, but actually transforming a C++ object into some sort of Python object is far beyond the capabilities of the current implementation. Finally, SWIG lacks an exception model which may be of concern to some users.

Limitations in the Python implementation

SWIG's support for Python is in many ways, limited by the fact that SWIG was originally developed for use with other languages. For example, representing C pointers as a character string can be seen as a carry-over from an earlier Tcl implementation. With a little work, I believe that one could create a Python "pointer" datatype while still retaining type-checking and other features.

Another limitation in the current Python implementation is the apparent lack of variable linking support. Many applications, particularly scientific ones, have global variables that a user may want to access. SWIG currently creates Python functions for accessing these variables, but other scripting languages provide a more natural mechanism. For example, in Perl, it is possible to attach "set" and "get" functions to a Perl object. These are evaluated whenever that variable is accessed in a script. For all practical purposes this special variable looks like any other Perl variable, but is really linked to an C global variable (of course, maybe there is a reason why Perl calls these "magic" variables).

If an equivalent variable linking mechanism is available in Python, I couldn't find it [4]. If not, it might be possible to create a new kind of Python datatype that supports variable linking, but which interacts seamlessly with existing Python integer, floating point, and character string types.

Summary and future work

This paper has described work in progress with the SWIG system and its support for Python. I believe that this approach shows great promise for future work--especially within the scientific and engineering community. Most of the people introduced to SWIG and Python have found both systems extremely easy and straightforward to use. I am currently quite interested in improving SWIG's interface to Python and exploring its use with numerical Python in order to build interesting and flexible scientific applications [6].

Acknowledgments

This project would not have been possible without the support of a number of people. Peter Lomdahl, Shujia Zhou, Brad Holian, Tim Germann, and Niels Jensen at Los Alamos National Laboratory were the first users and were instrumental in testing out the first designs. Patrick Tullmann, John Schmidt,Peter-Pike Sloan, and Kurtis Bleeker at the University of Utah have also been instrumental in testing out some of the newer versions and providing feedback. John Buckman has suggested many interesting improvements and is currently working on porting SWIG to non-Unix platforms including NT, OS/2 and Macintosh. Finally I'd like to thank Chris Johnson and the members of the Scientific Computing and Imaging group at the University of Utah for their continued support. Some of this work was performed under the auspices of the Department of Energy.

References

[1] D.M. Beazley. SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++ . Proceedings of the 4th USENIX Tcl/Tk Workshop (1996) (to appear).

[2] D.M. Beazley and P.S. Lomdahl Message-Passing Multi-Cell Molecular Dynamics on the Connection Machine 5 , Parallel Computing 20 (1994), p. 173-195.

[3] Bill Janssen and Mike Spreitzer, ILU : Inter-Language Unification via Object Modules, OOPSLA 94 Workshop on Multi-Language Object Models.

[4] Guido van Rossum, Extending and Embedding the Python Interpreter, October 1995.

[5] Guido van Rossum, Python Library Reference , October 1995.

[6] Paul F. Dubois, Konrad Hinsen, and James Hugunin, Numerical Python, Computers in Physics, (1996) (to appear)