Dabeaz

Dave Beazley's mondo computer blog. [ homepage ]

Thursday, January 28, 2010

 

A few useful bytearray tricks

When I first saw the new Python 3 bytearray object (also back-ported to Python 2.6), I wasn't exactly sure what to make of it. On the surface, it seemed like a kind of mutable 8-bit string (a feature sometimes requested by users of Python 2). For example:

>>> s = bytearray(b"Hello World")
>>> s[:5] = b"Cruel"
>>> s
bytearray(b'Cruel World')
>>>

On the other hand, there are aspects of bytearray objects that are completely unlike a string. For example, if you iterate over a bytearray, you get integer byte values:

>>> s = bytearray(b"Hello World")
>>> for c in s: print(c)
...
72
101
108
108
111
32
87
111
114
108
100
>>> 

Similarly, indexing operations are tied to integers:

>>> s[1]
101
>>> s[1] = 97
>>> s[1] = b'a'
Traceback (most recent call last):
  File "", line 1, in 
TypeError: an integer is required
>>> 

Finally, there's the fact bytearray instances have most of the methods associated with strings as well as methods associated with lists. For example:

>>> s.split()
[bytearray(b'Hello'), bytearray(b'World')]
>>> s.append(33)
>>> s
bytearray(b'Hello World!')
>>>

Although documentation on bytearrays describes these features, it is a little light on meaningful use cases. Needless to say, if you have too much spare time (sic) on your hands, this is the kind of thing that you start to think about. So, I thought I'd share three practical uses of bytearrays.

Example 1: Assembling a message from fragments

Suppose you're writing some network code that is receiving a large message on a socket connection. If you know about sockets, you know that the recv() operation doesn't wait for all of the data to arrive. Instead, it merely returns what's currently available in the system buffers. Therefore, to get all of the data, you might write code that looks like this:

# remaining = number of bytes being received (determined already)
msg = b""
while remaining > 0:
    chunk = s.recv(remaining)    # Get available data
    msg += chunk                 # Add it to the message
    remaining -= len(chunk)  

The only problem with this code is that concatenation (+=) has horrible performance. Therefore, a common performance optimization in Python 2 is to collect all of the chunks in a list and perform a join when you're done. Like this:

# remaining = number of bytes being received (determined already)
msgparts = []
while remaining > 0:
    chunk = s.recv(remaining)    # Get available data
    msgparts.append(chunk)       # Add it to list of chunks
    remaining -= len(chunk)  
msg = b"".join(msgparts)          # Make the final message

Now, here's a third solution using a bytearray:

# remaining = number of bytes being received (determined already)
msg = bytearray()
while remaining > 0:
    chunk = s.recv(remaining)    # Get available data
    msg.extend(chunk)            # Add to message
    remaining -= len(chunk)  

Notice how the bytearray version is really clean. You don't collect parts in a list and you don't perform that cryptic join at the end. Nice.

Of course, the big question is whether or not it performs. To test this out, I first made a list of small byte fragments like this:

chunks = [b"x"*16]*512

I then used the timeit module to compare the following two code fragments:

# Version 1
msgparts = []
for chunk in chunks:
    msgparts.append(chunk)
msg = b"".join(msgparts)

# Version 2
msg = bytearray()
for chunk in chunks:
    msg.extend(chunk)

When tested, version 1 of the code ran in 99.8s whereas version 2 ran in 116.6s (a version using += concatenation takes 230.3s by comparison). So while performing a join operation is still faster, it's only faster by about 16%. Personally, I think the cleaner programming of the bytearray version might make up for it.

Example 2: Binary record packing

This example is an slight twist on the last example. Support you had a large Python list of integer (x,y) coordinates. Something like this:

points = [(1,2),(3,4),(9,10),(23,14),(50,90),...]

Now, suppose you need to write that data out as a binary encoded file consisting of a 32-bit integer length followed by each point packed into a pair of 32-bit integers. One way to do it would be to use the struct module like this:

import struct
f = open("points.bin","wb")
f.write(struct.pack("I",len(points)))
for x,y in points:
    f.write(struct.pack("II",x,y))
f.close()

The only problem with this code is that it performs a large number of small write() operations. An alternative approach is to pack everything into a bytearray and only perform one write at the end. For example:

import struct
f = open("points.bin","wb")
msg = bytearray()
msg.extend(struct.pack("I",len(points))
for x,y in points:
    msg.extend(struct.pack("II",x,y))
f.write(msg)
f.close()

Sure enough, the version that uses bytearray runs much faster. In a simple timing test involving a list of 100000 points, it runs in about half the time as the version that makes a lot of small writes.

Example 3: Mathematical processing of byte values

The fact that bytearrays present themselves as arrays of integers makes it easier to perform certain kinds of calculations. In a recent embedded systems project, I was using Python to communicate with a device over a serial port. As part of the communications protocol, all messages had to be signed with a Longitudinal Redundancy Check (LRC) byte. An LRC is computed by taking an XOR across all of the byte values.

Bytearrays make such calculations easy. Here's one version:

message = bytearray(...)     # Message already created
lrc = 0
for b in message:
    lrc ^= b
message.append(lrc)          # Add to the end of the message

Here's a version that increases your job security:

message.append(functools.reduce(lambda x,y:x^y,message))

And here's the same calculation in Python 2 without bytearrays:

message = "..."       # Message already created
lrc = 0
for b in message:
    lrc ^= ord(b)
message += chr(lrc)        # Add the LRC byte

Personally, I like the bytearray version. There's no need to use ord() and you can just append the result at the end of the message instead of using concatenation.

Here's another cute example. Suppose you wanted to run a bytearray through a simple XOR-cipher. Here's a one-liner to do it:

>>> key = 37
>>> message = bytearray(b"Hello World")
>>> s = bytearray(x ^ key for x in message)
>>> s
bytearray(b'm@IIJ\x05rJWIA')
>>> bytearray(x ^ key for x in s)
bytearray(b"Hello World")
>>> 

Final Comments

Although some programmers might focus on bytearrays as a kind of mutable string, I find their use as an efficient means for assembling messages from fragments to be much more interesting. That's because this kind of problem comes up a lot in the context of interprocess communication, networking, distributed computing, and other related areas. Thus, if you know about bytearrays, it might lead to code that has good performance and is easy to understand.

That's it for this installment. In case you're wondering, this topic is also related to my upcoming PyCON'2010 tutorial "Mastering Python 3 I/O."


Comments:
Dave, thanks for the post. My job involves a good deal of packing and unpacking binary data (digital TV streams). As you point out, byte arrays are handy for this. There's also the other "b", for binary literals, which I now use for bitmasks in preference to traditional hex numbers.
 
You might get better performance using "recv_into()" with a reused buffer, rather than creating a new buffer each time with recv().

Even better would be if recv_into() took an initial offset, so in the case of a known message size you could use a single buffer.
 
Andrew,

Definitely agree on recv_into(). However, that involves the buffer interface--the subject of a future blog post I think :-).

As far as I know though, I don't think recv_into() actually lets you specify a buffer offset--something which is a little unfortunate.
 
A quick note: some recent stuff I've done (in Python 2.x) makes use of the array module because it actually performed better than various alternative methods of reading binary (byte) data from files.

I measured the different methods using profiling techniques that Andrew spoke about at EuroPython a couple of years ago, the updated techniques I used being documented here:

http://wiki.python.org/moin/PythonSpeed/Profiling

The array module is a forgotten wonder of the standard library.
 
Wow. I really thought I could use a bytearray as the target for a recv_into(). I thought that was the point. I tried it out under 2.6 and got "argument 1 must be pinned buffer, not bytearray." What's a "pinned buffer"?

Ahh, but it works under Python 3, so life is as I expected.

import socket; sock = socket.socket(); sock.connect( ("dalkescientific.com", 80) ); sock.send(b"GET / HTTP/1.0\r\n\r\n"); b = bytearray(b" "*10); sock.recv_into(b)


Which gives me

bytearray(b'HTTP/1.1 2')

Why do you need to explain the buffer interface? I thought a point of PEP 3118 was to make there be less need to explain.

And no, there's no option for telling recv_info the initial byte offset. There should be one though.
 
The behavior of bytearray with recv_into() in Python 2.6 is a bug as far as I'm concerned. Especially since recv_into() works with all sorts of other array-like objects (arrays, numpy arrays, ctypes objects, etc.).

The fact that recv_into() doesn't accept a byte offset is a more serious problem though. The main problem is that recv() is not guaranteed to receive all of the bytes. Therefore, if you want to continue receiving where you left off, there's no apparent way to go about it.

Given that recv_into() is new in Python 2.6, maybe it's still a little half-baked. Who knows?
 
Slight note WRT the struct.pack call. It actually has a sibling, struct.pack_into, as seen below:

>>> import struct
>>> br = bytearray("ABCDEFGH")
>>> struct.pack_into("I", br, 0, 61374)
>>> br
bytearray(b'\xbe\xef\x00\x00EFGH')
 
That's a good observation on pack_into(). I just did a little test using pack_into() instead of pack() to fill a preallocated buffer. The code runs about 6% faster than the version shown in the blog post. It's actually pretty amazing that the version using .extend() is that competitive with a version that preallocates all of the memory in advance.
 
I was surprised to find a big difference between Python 2.6 and 3.1 with regard to Example 1 - I'm particularly interested in performance of string concatenation vs appending to a list.

Python 2.6 concatenation: 76.393012046813965
Python 2.6 appending: 100.18277502059937

Python 3.1 concatenation: 370.36962389945984
Python 3.1 appending: 101.32396101951599

Now, this example uses bytes, not strings, so I can't figure out why there is such a poor performance in 3.1 (if it was strings, maybe some loss of performance due to Unicode...). I would appreciate if you could comment further on this point. Thanks!
 
It's hard to say why you're getting those numbers without seeing the code you have used for testing. However, the performance of += versus appned/.join() is a pretty well known optimization in Python (+= having much worse performance if a large number of fragments are being concatenated).
 
I did use pretty much the same snippets as in your post:

>>> from timeit import Timer
>>> setup = """chunks = [b"x"*16]*512"""
>>> s = """msgparts = b""
... for chunk in chunks:
... msgparts += chunk"""
>>> t = Timer(s, setup)
>>> t.timeit()
80.408943176269531
>>> s_append = """msgparts = []
... for chunk in chunks:
... msgparts.append(chunk)
... msg = b"".join(msgparts)"""
>>> t = Timer(s_append, setup)
>>> t.timeit()
98.628241062164307

(and the same under Python 3.1)
 
Hmmm. I just tried it under 2.6 and I get the same result as you. That's really pretty interesting and unexpected. It is also interesting that Python 2.6 and 3.1 have almost exactly the same performance for append, but vary so widely in concatenation.

It's hard to say more about what this might be without looking at the source code for Python itself. Bizarre.
 
Nice article! It's interesting, I have just solved a fragmentation problem of my audio streaming server based on Twisted. It was originally buffering the audio data chunks with a list of string. Allocate and deallocate string chunks repeatly mess the heap up. I was writing a C module for allocating big chunk of memory at first for solving that problem. Suddenly, I found bytearray is just what I want. I was confused at that moment, I don't understand why there is no document for it in Python2.6, now I know that might be because it is a back ported feature. With the bytearray, my server use little memory now. The whole story and plots can be found here: http://stackoverflow.com/questions/2100192/how-to-find-the-source-of-increasing-memory-usage-of-a-twisted-server

If this case is helpful for your topic in PyCon, and if you need more information, I'm willing to share, just send me a e-mail to bornstub at gmail dot com :P

Victor Lin.
 
Post a Comment





<< Home

Archives

08/01/2009 - 09/01/2009   09/01/2009 - 10/01/2009   10/01/2009 - 11/01/2009   11/01/2009 - 12/01/2009   12/01/2009 - 01/01/2010   01/01/2010 - 02/01/2010   02/01/2010 - 03/01/2010   04/01/2010 - 05/01/2010  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]