3

I need to read up to the point of a certain string in a binary file, and then act on the bytes that follow. The string is 'colr' (this is a JPEG 2000 file) and here is what I have so far:

from collections import deque

f = open('my.jp2', 'rb')
bytes =  deque([], 4)
while ''.join(map(chr, bytes)) != 'colr':
    bytes.appendleft(ord(f.read(1)))

if this works:

bytes =  deque([0x63, 0x6F, 0x6C, 0x72], 4)
print ''.join(map(chr, bytes))

(returns 'colr'), I'm not sure why the test in my loop never evaluates to True. I wind up spinning - just hanging - I don't even get an exit when I've read through the whole file.

JStroop
  • 475
  • 1
  • 8
  • 21
  • Did you have a look at http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python ? – Dr. Jan-Philip Gehrcke Sep 12 '12 at 19:41
  • @Jan-Philip - thanks! I should probably look at adapting one of those. First and foremost, though, this answer http://stackoverflow.com/a/6822761/714478 made me realize that I was just appending to the wrong side of the deque, and my method above, with that correction, works just fine! – JStroop Sep 12 '12 at 20:47

2 Answers2

2

Change your bytes.appendleft() to bytes.append() and then it will work -- it does for me.

martineau
  • 119,623
  • 25
  • 170
  • 301
0
  with open("my.jpg","rb") as f:
       print f.read().split("colr",1)

if you dont want to read it all at once ... then

def preprocess(line):
    print "Do Something with this line"
def postprocess(line):
    print "Do something else with this line"
currentproc = preprocess
with open("my.jpg","rb") as f:
   for line in f:
       if "colr" in line:
           left,right = line.split("colr")
           preprocess(left)
           postprocess(right) 
           currentproc= postprocess
        else:
           currentproc(line)

its line by line rather than byte by byte ... but meh ... I have a hard time thinking that you dont have enough ram to hold the whole jpg in memory... python is not really an awesome language to minimize memory or time footprints but it is awesome for functional requirements :)

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • This could work, but I should have been clearer I need to continue reading through the file, pulling out other info. Would you be willing to elaborate, showing how I could continue to read through the file object, if possible? – JStroop Sep 12 '12 at 19:06
  • The file is >37 MB, and this string should be less than 60 bytes or so in, so again, I'm wondering how I continue to read `f` byte by byte after the `split()` – JStroop Sep 12 '12 at 19:23
  • @JStroop: `f.read()` reads the entire file, so just store this data in a variable, then `split()` as suggested by Joran and, after that, continue processing the data. – Dr. Jan-Philip Gehrcke Sep 12 '12 at 19:23
  • @JStroop, if your file is that large then, I agree, it is maybe not the best idea to read it into memory at once before processing it. However, this depends on your application details. – Dr. Jan-Philip Gehrcke Sep 12 '12 at 19:24
  • Reading the whole file into memory is definitely not an option; especially when everything I need is within the first 300 bytes. – JStroop Sep 12 '12 at 19:30
  • you dont have 37 MB of ram thats available? ok Ill update to do one that doesnt read it all at once – Joran Beasley Sep 12 '12 at 20:09
  • For potentially several hundred times a minute, that doesn't seem very efficient. Thanks for your help! – JStroop Sep 12 '12 at 21:08
  • 1
    @JoranBeasley: about reading line by line from a file with binary data vs. reading it all at once: do you just hope that there are by accident a lot of bit sequences corresponding to a newline character? I think that's not a solution :-) Have a look at http://stackoverflow.com/questions/4566498/python-file-iterator-over-a-binary-file-with-newer-idiom -- but then you have the problem that you could split the sequence that is searched for right in the middle. I think all of this is not a valid approach. The right solution is a 'moving window'. – Dr. Jan-Philip Gehrcke Sep 13 '12 at 09:02