0

I'm trying to read blocks from a binary file (oracle redo log) but I'm having a issue where, when I try to read a 512 byte block using os.read(fd,512) I am returned less than 512 bytes. (the amount differs depending on the block)

the documentation states that "at most n Bytes" so this makes sense that I'm getting less than expected. How can I force it to keep reading until I get the correct amount of bytes back?

I've attempted to adapt the method described here Python f.read not reading the correct number of bytes But I still have the problem

def read_exactly(fd, size):
    data = b''
    remaining = size
    while remaining:  # or simply "while remaining", if you'd like
        newdata = read(fd, remaining)
        if len(newdata) == 0:  # problem
            raise IOError("Failed to read enough data")
        data += newdata
        remaining -= len(newdata)
    return data


def get_one_block(fd, start, blocksize):
    lseek(fd, start, 0)
    blocksize = blocksize

    print('Blocksize: ' + str(blocksize))
    block = read_exactly(fd, blocksize)
    print('Actual Blocksize: ' + str(block.__sizeof__()))
    return block

which then returns the error: OSError: Failed to read enough data

My code:

from os import open, close, O_RDONLY, lseek, read, write, O_BINARY, O_CREAT, O_RDWR

def get_one_block(fd, start, blocksize):
    lseek(fd, start, 0)
    blocksize = blocksize

    print('Blocksize: ' + str(blocksize))
    block = read(fd, blocksize)
    print('Actual Blocksize: ' + str(block.__sizeof__()))

    return block

def main():
    filename = "redo_logs/redo03.log"
    fd = open(filename, O_RDONLY, O_BINARY)
    b = get_one_block(fd, 512, 512)

Output

Blocksize: 512
Actual Blocksize: 502

in this instance the last byte read is 0xB3 which is followed by 0x1A which i believe is the problem.

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
EF 42 B8 5A DC D1 63 1B A3 31 C7 5E 9F 4A B7 F4 
4E 04 6B E8 B3<<-- stops here -->>1A 4F 3C BF C9 3C F6 9F C3 08 02 
05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Any help would be greatly appreciated :)

Brendan Scullion
  • 390
  • 1
  • 3
  • 16
  • 2
    Is there any reason why you are directly using ``os.open`` instead of the builtin ``open``? The former is supposed to be a pretty direct interface to how your OS does I/O. – MisterMiyagi Nov 05 '19 at 22:01
  • Could you show the code where you attempted to adopt the method given in the linked question's answer? (Without showing how the existing answer was applied, and how that failed to help you, the question is effectively a duplicate). – Charles Duffy Nov 05 '19 at 22:01
  • If you need your data in exactly those 512 bytes chunks, then you should keep reading until you get EOF or you read up to your chunk size. I think the SO answer you provided in your question is the way to go. – luis.parravicini Nov 05 '19 at 22:03
  • BTW, feel free to @ notify me after you've updated the question to show how the existing answer didn't work; I'll be happy to evaluate it for reopening (that can also happen by vote, but it's a bit quicker/easier with a dupehammer). – Charles Duffy Nov 05 '19 at 22:04
  • I believe you need to bitwise-or your flags, not apply them in sequence like you're doing: `open(filename, O_RDONLY | O_BINARY)`. As written, it looks like you're opening the file with `O_RDONLY` active, and then setting `mode` (the next argument) to `O_BINARY`. Alternatively, you could use the builtin `open`: `open(filename, 'rb')`. – b_c Nov 05 '19 at 22:08
  • So, getting a 0-length read means you hit the end of the file -- there *wasn't* 512 bytes left to read, and the source wasn't something like a still-open FIFO or socket that could still have more content added later. Is there any reason to believe that isn't the case? – Charles Duffy Nov 05 '19 at 22:12
  • To be clear, the standard C library will silently retry I/O syscalls as long as they get `EINTR` or another error that's basically the OS saying "got interrupted before we could finish"; but if it's a terminal error (a category in which hitting EOF is included), then it's not normal or expected for *anything* to automatically retry. – Charles Duffy Nov 05 '19 at 22:15
  • @CharlesDuffy, Updated. the app is going to be reading a lot of data so i was hoping that using a low level system function might improve my performance (i could be wrone here and maybe there will be no difference). – Brendan Scullion Nov 05 '19 at 22:18
  • 1
    If you care about performance, Python is the wrong language; I'd strongly recommend switching to Go, Julia, or something else where runtime performance is a top-level design goal. Once you're eating all of Python's runtime overhead already, you're not going to gain much by implementing the work the high-level calls do yourself vs letting the upstream-tested implementations do their thing. – Charles Duffy Nov 05 '19 at 22:21
  • ...anyhow -- can you provide a documented process someone who isn't you and doesn't have your Oracle replay logs can use to be able to see this bug themselves (and thus to test their answer, or evaluate someone else's proposed answer for correctness)? – Charles Duffy Nov 05 '19 at 22:22

1 Answers1

0

You need to read inside a while loop and check the true number of bytes you've got.

If you got less you read again with the left delta.

the while exits when you got what you expected or reached EOF.

Lior Cohen
  • 5,570
  • 2
  • 14
  • 30
  • The information in this answer is already given in the question the OP acknowledged as a near-duplicate. If it's responsive, then the question is in fact a *full* duplicate, and should be closed as such, not answered. See the "Answer Well-Asked Questions" section of [How to Answer](https://stackoverflow.com/help/how-to-answer). – Charles Duffy Nov 05 '19 at 22:02