2

I have a simple example:

def __init__(self,string):
    self.string = string

def UI32(self):
    tmp = self.string[:4]
    self.string = self.string[4:]
    return unpack(">I",tmp)[0]

data = file.read()
U = UI(data)
for i in range(60000):
    test = UI32()

Total time: 22 seconds!

seriousdev
  • 7,519
  • 8
  • 45
  • 52
bdfy
  • 285
  • 2
  • 5
  • 7

2 Answers2

5

First of all, I cannot reproduce the 22s on my system (Intel Nehalem, 64-bit Ubuntu, Python 2.6.5).

The following takes 1.4s (this is essentially your code with some blanks filled in by me):

import struct

class UI(object):
    def __init__(self,string):
        self.string = string

    def UI32(self):
        tmp = self.string[:4]
        self.string = self.string[4:]
        return struct.unpack(">I",tmp)[0]

U = UI('0' * 240000)
for i in range(60000):
    test = U.UI32()

Now, there are several glaring inefficiencies here, especially around self.string.

I've rewritten your code like so:

import struct

class UI(object):
    def __init__(self,string):
        fmt = '>%dI' % (len(string) / 4)
        self.ints = struct.unpack(fmt, string)
    def __iter__(self):
        return iter(self.ints)

U = UI('0' * 240000)
count = 0
for test in U:
    count += 1
print count

On the same machine it now takes 0.025s.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
3

Every iteration of the 60,000 cycle loop you are copying the entire memory buffer:

self.string = self.string[4:]

It would be more efficient to simply walk through the string using indexes and at the end clear the variable.

Steve-o
  • 12,678
  • 2
  • 41
  • 60
  • 1
    And use `xrange` instead of `range`. See [Should you always favor xrange() over range()?](http://stackoverflow.com/questions/135041/should-you-always-favor-xrange-over-range) – Paolo Moretti Jul 29 '11 at 09:28