| Amelia Meyer | 36625 Metro Ct., Sterling Heights, MI | RENT--"La Vie Boheme A" | busy
So, I work with images and Python a lot at work, and came across an interesting issue. I have a C++ DLL I need to pass an image to, but the image is stored internally as a list of integer pixel values, and the DLL expects a C char array. Obviously, a conversion needs to take place. However, this project is destined to run on a 400MHz Geode processor with 256 MB RAM in a smart camera running Windows XP Embedded...so I needed to make sure the process was a fast one to get an acceptable cycle time.
The naïve way to do this was of course to use a
for loop and append the
chr value to a generated string:
string = '' for byte in img: string += chr(byte)
This ended up running in 180.46ms on the test image, 640 X 480, on my dev laptop with a 2.5GHx Core 2 Duo and gobs of RAM...tested on the camera, this took 5.5s to run...bad.
Clearly, a better solution was in order. I heard great things about the Pythonic way of doing things, generating a list of small strings and
string =  for byte in img: string.append(chr(byte)) string = ''.join(string)
Well, 168.88ms is better than 180.46ms, but still way too high. Another method suggested was to use a file-like string to write the data to, essentially creating a mutable string that tolerates rapid appending:
from cStringIO import StringIO file_str = StringIO() for byte in img: file_str.write(chr(byte))string = file_str.getvalue()
Definitely not--not entirely sure why this consistently runs in 206.72ms, but that is going the wrong way and I do not have time to debug it. Next solution comes from the Python manual, list comprehensions:
string = ''.join([chr(byte) for byte in img])
Much faster at 140.53ms, even from the first few, but still not fast enough. Well, list comprehensions generate the entire list in memory at once, so let us try a generator expression, which generates values as needed:
string = ''.join((chr(byte) for byte in img))
Faster still, but still too slow at 118.97ms. The next step is to start exporting this conversion to C using
string = ''.join(map(chr, img))
Holy wa! 75.61ms! Less than half the time of the first method. This code runs in 2.2s on the camera, though, so while it is impressive to be sure, it is still too slow. Next, a suggestion from the Python mailing list, to use the C string methods to concatenate the strings:
import string as bob string = bob.joinfields(map(chr, img),'')
No real difference, with a runtime of 75.73ms. Hmm...Now to take the advice of Guido van Rossum, who apparently dealt with a very similar problem using a character array:
# Char array -- 54.08 from array import array string = array('B', img).tostring()
Well, 54.08ms is another impressive reduction, but it appears we are rapidly getting to the point of diminishing returns. The only next step is to export the code out to a C extension. However, in this situation, it makes more sense to ask the developer of the DLL to accept the data in an integer form rather than a character one, and convert in C++. He has agreed to consider it, and so there ends the quest for optimization. However, it is a useful thing to have learned, the varying degrees of speed and simplicity of solutions to this problem.
- The development laptop that the code snippets were tested on is a Lenovo ThinkPad T61, with a 2.5GHz Intel Core 2 Duo and 2GB RAM, running XP Pro SP3. All code was executed in the built-in interpreter to Scorpion Vision Software, run concurrently to Thunderbird, Firefox, several PuTTY windows, Skype, and of course, Scorpion itself.
- Timing was done using the
clock()method, triggered before all of the conversion code (but not the
importstatements) and printed out after.
- Timing was averaged between ten runs of each method.
- The three time values from the camera were taken in a similar fashion from and instance of Scorpion running as the only foreground application on the Sony XCI-V3 100 camera, running XP Embedded SP2 on a 400MHz Geode with 256MB RAM.
- The test image was captured on the Sony camera and simulated (loaded from file) in Scorpion on both machines.
Which method of integrating python with a C++ DLL are you using? Raw Python C API, pyrex, or what?
It may actually be smartest to do this type coercion right in between since its probably already doing a type coercion right there no matter what you do on each end.
I just spent 2 hours debugging a C wrapper for libpcap, now is your chance to pester me on the finer points of the Python C api :)
I have been using
ctypes to talk to the DLL--seemed to be the most straightforward way of doing it.