1

I have an input file which looks like this

some data...
some data...
some data...
...
some data...
<binary size="2358" width="32" height="24">
data of size 2358 bytes
</binary>
some data...
some data...

The value 2358 in the binary size can change for different files. Now I want to extract the 2358 bytes of data for this file (which is a variable) and write to another file.

I wrote the following code for the same. But it gives me an error. The problem is, I am not able to extract this 2358 bytes of binary data and write to another file.

c = responseFile.read(1)
ValueError: Mixing iteration and read methods would lose data 

Code Is -

import re

outputFile = open('output', 'w')    
inputFile = open('input.txt', 'r')
fileSize=0
width=0
height=0

for line in inputFile:
    if "<binary size" in line:
        x = re.findall('\w+', line)
        fileSize = int(x[2])
        width = int(x[4])
        height = int(x[6])
        break

print x
# Here the file will point to the start location of 2358 bytes.
for i in range(0,fileSize,1):
    c = inputFile.read(1)
    outputFile.write(c)


outputFile.close()
inputFile.close()

Final Answer to my Question -

#!/usr/local/bin/python

import os
inputFile = open('input', 'r')
outputFile = open('output', 'w')

flag = False

for line in inputFile:
    if line.startswith("<binary size"):
        print 'Start of Data'
        flag = True
    elif line.startswith("</binary>"):
        flag = False
        print 'End of Data'
    elif flag:
        outputFile.write(line) # remove newline

inputFile.close()
outputFile.close()

# I have to delete the last extra new line character from the output.
size = os.path.getsize('output')
outputFile = open('output', 'ab')
outputFile.truncate(size-1)
outputFile.close()
Raj
  • 3,300
  • 8
  • 39
  • 67

3 Answers3

3

How about a different approach? In pseudo-code:

for each line in input file:
    if line starts with binary tag: set output flag to True
    if line starts with binary-termination tag: set output flag to False
    if output flag is True: copy line to the output file

And in real code:

outputFile = open('./output', 'w')    
inputFile = open('./input.txt', 'r')

flag = False

for line in inputFile:

    if line.startswith("<binary size"):
        flag = True
    elif line.startswith("</binary>"):
        flag = False
    elif flag:
        outputFile.write(line[:-1]) # remove newline


outputFile.close()
inputFile.close()
daedalus
  • 10,873
  • 5
  • 50
  • 71
2

Try changing your first loop to something like this:

while True:
    line = inputFile.readline()
    # continue the loop as it was

This gets rid of iteration and only leaves read methods, so the problem should disappear.

stranac
  • 26,638
  • 5
  • 25
  • 30
1

Consider this method:

import re

line = '<binary size="2358" width="32" height="24">'

m = re.search('size="(\d*)"', line)

print m.group(1)  # 2358

It varies from your code, so its not a drop-in replacement, but the regular expressions functionality is different.

This uses Python's regex group capturing features and is much better than your string splitting method.

For example, consider what would happen if the attributes were re-ordered. For example:

<binary width="32" size="2358" height="24">'
instead of
<binary size="2358" width="32" height="24">'

Would your code still work? Mine would. :-)


Edit: To answer your question:

If you want to read n bytes of data from the beginning of a file, you could do something like

bytes = ifile.read(n)

Note that you may get less than n bytes if the input file is not long enough.

If you don't want to start from the "0th" byte, but some other byte, use seek() first, as in:

ifile.seek(9)
bytes = ifile.read(5)

Which would give you bytes 9:13 or the 10th through 14th bytes.

jedwards
  • 29,432
  • 3
  • 65
  • 92
  • thanks for the reply. but extracting size is not a problem. My problem is to extract the data which is 2358 bytes. Also the value 2358 can change – Raj Jun 26 '12 at 09:10
  • Did you edit the question or did I totally misread it? Either way I'll edit my answer in a moment. – jedwards Jun 26 '12 at 09:11
  • Extracting size is not a problem. My problem is I am not able to extract 2358 bytes of data in the input file which follows this line. Thanks ! :) – Raj Jun 26 '12 at 09:14
  • also `read(file, "rb")` for reading binary data (http://stackoverflow.com/q/1035340/1176601) – Aprillion Jun 26 '12 at 09:22
  • `while True: line = inputFile.readline() if " – Raj Jun 26 '12 at 09:29
  • is `dataSize` correct? Can you print it after you set it? Also, if you're reading from the same file but not closing / re-opening it between `read()` calls, you need to reset the pointer with `ifile.seek(0);` prior to the `ifile.read()` call. – jedwards Jun 26 '12 at 11:46