3

In the code below concatenation is the bottleneck. As you can see i've tried some sophisticated methods to speed this up, but its bloody slow anyway. I would like to know if there is anything i can do to make it faste.

BTW both plain and secret are data read from binary file and they are quite big (around 1mb)

x = b''
if len(plain) < len(secret*8):
    return False
i = 0

for secByte in secret:
    for j in range(8):
        z = setBit(plain[i],0,getBit(secByte,j))
        #x += bytes([z])
        x = x.join([b"", bytes([z])])
        #x = array.array("B",(int(z) for z in x.join([b"", bytes([z])]))).tostring()
        i = i+1
sowa
  • 56
  • 5
  • can you add a simple "C" like pseudocode of what you are trying? I am unfamiliar with setBit in Python... – K. Brafford Nov 27 '10 at 03:17
  • For string concatenation methods and speed see these:http://stackoverflow.com/questions/1316887/what-is-the-most-efficient-string-concatenation-method-in-python – mshsayem Nov 27 '10 at 04:35
  • . . . http://stackoverflow.com/questions/1349311/python-string-join-is-faster-than-but-whats-wrong-here – mshsayem Nov 27 '10 at 04:36

2 Answers2

7

Python's lists have O(1) append, at least in the amortized sense. Instead of doing the join in the innermost list, you could build one big list and then join them at the end. This will turn your algorithm from O(N^2) to O(N). It's tough to give you working code without knowing exactly what your setBit() and getBit() functions are doing, but something like:

L = []
for secByte in secret:
    for j in range(8):
         z = ...
         L.append(z)
x = b"".join(L)
xscott
  • 2,350
  • 16
  • 18
  • 1
    That has not been true since Python 2.3. Time some code with concatenation and with joins. – nate c Nov 27 '10 at 02:50
  • 3
    @nate c, Python strings are an array of contiguous bytes and associated length. His code with x.join(...) creates a new, slightly longer string every time before it deallocates the old string when he assigns to it. This is O(N^2) behavior. You're wrong, and you shouldn't have modded me down. – xscott Nov 27 '10 at 03:56
  • 2
    The suggestion that one build strings as lists and only join them as necessary (preferably after the entire list is built) is still valid so far as I know. It's been a recommended Python practice for a very long time. – Jim Dennis Nov 27 '10 at 08:31
  • @Jim: I'm not saying its not. I'm saying Big O analysis of Strings is incorrect. – nate c Nov 27 '10 at 22:59
  • @xscott: Heres the discussion on the patch `-- we are not talking about 2x or even 10x faster but roughly Nx faster where N is the size of the input data set.` http://mail.python.org/pipermail/python-dev/2004-August/046686.html – nate c Nov 27 '10 at 23:01
  • @nate c, that is pretty interesting for at least the following reasons: it's not well advertised (the Python site *still* says to use str.join(list)), it's not implemented in stringobject.c, it *is* implemented as a hack in ceval.c, and the link you sent shows Guido was annoyed when it was added. I was wrong about += with strings, but the x.join(...) in @sowa's code is still O(N^2). – xscott Nov 28 '10 at 21:40
5

I don't think you should be using string concatenation at all for this. It would be better to create a mutable bytearray of the full size of the final data and then set each byte. This is very much O(N) and for what you are doing using a bytearray is much more natural than string manipulation:

x = bytearray(len(secret)*8)   # creates an array of zero bytes
i = 0
for secByte in secret:
    for j in range(8):
        x[i] = setBit(plain[i], 0, getBit(secByte, j))
        i += 1
Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
  • tyvm, it's exactly what i needed (it works like 100x faster) xscott solution is propably as fast as this, but this one is a litle better suiting to my problem – sowa Nov 27 '10 at 14:14