Python f.read not reading the correct number of bytes

Question

I have code that is supposed to read 4 bytes but it is only reading 3 sometimes:

f = open('test.sgy', 'r+')
f.seek(99716)
AAA = f.read(4)
BBB = f.read(4)
CCC = f.read(4)
print len(AAA)
print len(BBB)
print len(CCC)

exit()

And this program returns: 4 3 4

What am I doing wrong? Thanks!

What does `print repr(BBB)` show? I'm going to hazard a guess that you'll see a newline character (`\n`) in the output. If so, the solution is probably to open your file in binary mode: `f = open('test.sgy', 'rb+')`. — Mark Dickinson, Apr 01 '15 at 17:24
Some possible causes are listed here: http://stackoverflow.com/a/4433813/270986. Opening in binary mode would be the first thing to try, though. — Mark Dickinson, Apr 01 '15 at 17:31

loopbackbee · Answer 1 · 2015-04-02T10:24:44.787

7

You're assuming read does something it does not. As its documentation tells you:

read(...)
    read([size]) -> read at most size bytes, returned as a string.

it reads at most size bytes

If you need exactly size bytes, you'll have to create a wrapper function.

Here's a (not thoroughly tested) example that you can adapt:

def read_exactly( fd, size ):
    data=""
    remaining= size
    while remaining>0:      #or simply "while remaining", if you'd like
        newdata= fd.read(remaining)
        if len(newdata)==0: #problem
            raise IOError("Failed to read enough data")
        data+=newdata
        remaining-= len(newdata)
    return data

As Mark Dickinson mentioned in the comments, if you're on Windows, make sure you're reading in binary mode - otherwise you risk reading your (binary) data wrong.

edited Apr 02 '15 at 10:24

answered Apr 01 '15 at 15:26

loopbackbee

21,962
10
62
97

I suspect this is going to give bad results: if the reason for the missing byte is that a `\r\n` got translated into a `\n` by Python's text-mode file-reading machinery, then reading another byte on top of that isn't going to fix the corruption that's already crept in. Opening in binary mode is critical here. – Mark Dickinson Apr 01 '15 at 18:27
@MarkDickinson I wasn't aware that situation could cause a short read - note the answer you linked to mentions `\r\n`causing a short read *on binary mode*. The [type of file the OP is reading](https://en.wikipedia.org/wiki/SEG_Y) may have textual records, but I thought it unlikely that OP would be grabbing 3 byte chunks of those. Notwithstanding, reading in binary mode is still a good idea, and I'll add it to my answer, thanks for the suggestion! – loopbackbee Apr 02 '15 at 10:08
Looking at the OP's other recent question, it looks as though it was a Ctrl-Z that was the problem, not a `\r\n`. I think that the short read in *binary mode* for `\r\n` in that other question must have been a typo; it doesn't make any sense to me. And yes, SEG-Y is in some sense a mixed binary and text format. (I've lost *way* too many hours of my life looking at SEG-Y files. ;-). – Mark Dickinson Apr 02 '15 at 13:56
@goncalopp: And you're absolutely right: an `\r\n` can't cause a short read (I've just managed to find a Windows machine to test on). It turns out that the `n` given refers to the number of *output* characters, so it would cause `f.read(4)` to still return 4 bytes, but to advance 5 bytes in the underlying file. – Mark Dickinson Apr 02 '15 at 14:05
@MarkDickinson The way I interpreted the answer you linked to was that `read` would refuse to return (binary) data if it would cut `\r\n` in half - though I tried it here and it cut it happily. But we're probably digging too much into implementation details, and behavior could differ across versions and platforms – loopbackbee Apr 02 '15 at 14:28

Python f.read not reading the correct number of bytes

1 Answers1

Linked