How to read bytes from file

Question

I'm trying to read the length of some metadata from a .lrf file. (Used with the program LoLReplay)

There's not really documentation on these files, but I have already figured out how to do this in C++. I'm trying to re-write the project in python for multiple reasons, but I come across an error.

To first explain, the .lrf file has metadata immediately at the start of the file in this format:

first 4 bytes are for something I have no clue about.
next 4 bytes store the length of the metadata in hexidecimal, up until the end of the metadata, which after is the actual contents of the replay.
bytes after the initial 8 bytes are the metadata in json format

The problem I'm having is actually reading the metadata length. This is the current function I have:

def getMetaLength(self):
    try:
        file = open(self.file,"r")
    except IOError:
        print ("Failed to open file.")
        file.close()
    #We need to skip the first 4 bytes.
    file.read(4)
    mdlength = file.read(4)
    print(hex(mdlength))
    file.close()

When I call this function, the shell returns a traceback stating:

    Traceback (most recent call last):
    File "C:\Users\Donald\python\lolcogs\lolcogs_main.py", line 6, in <module>
    lolcogs.getMetaLength()
    File "C:\Users\Donald\python\lolcogs\LoLCogs.py", line 20, in getMetaLength
    file.read(4)
    File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3648:       character maps to <undefined>

My best guess is that read() is trying to read characters that are encoded in some unicode format, but these are definitely just bytes that I am attempting to read. Is there a way to read these as bytes? Also, is there a better way to skip bytes when you are attempting to read a file?

Try opening the file in _binary mode_: `f = open(self.file,"rb")`. Also, don't name it `file` since it will conflict with built in `file` type name. — Paulo Bu, Feb 28 '14 at 23:35
In Python2.7 it is defined. In Python3 no. But reading the OP's code he's probably using Python 3 so ignore my comment :) — Paulo Bu, Feb 28 '14 at 23:38
@PauloBu Thanks, I used "rb" instead of just "r" and now I get the error "TypeError: 'bytes' object cannot be interpreted as an integer" but the c++ version had to do some tricky stuff so I already have a general idea of what to do to fix this. — shadefinale, Feb 28 '14 at 23:40

Oleh Prypin · Answer 1 · 2014-02-28T23:52:21.650

3

In Python 3 files are opened in text mode with the system's encoding by default. You need to open your file in binary mode:

file = open(self.file, 'rb')

Another problem you will run into is that file.read(4) will give you a string of 4 bytes (which the hex function doesn't understand). And you possibly want an integer. For that, refer to int.from_bytes, or, more generally, to the struct module. Then you can print that number in hexadecimal format as so:

mdlength = int.from_bytes(file.read(4), byteorder='big')
print(hex(mdlength))

edited Feb 28 '14 at 23:52

answered Feb 28 '14 at 23:46

Oleh Prypin

33,184
10
89
99

Amazing! The int.from_bytes() function is exactly what I needed. In c++ I don't know if there is an equivalent function but I had to do this manually in c++ and was about to do it manually in python until I read your comment! Thanks! – shadefinale Feb 28 '14 at 23:54

MxLDevs · Answer 2 · 2014-03-01T01:27:43.707

Binary files should be handled in binary mode:

f = open(filename, 'rb')

For skipping bytes, I typically use file seek (SEEK_CUR or SEEK_SET) or I just do arbitrary file.read(n) if I didn't want to bother with formality. Only time I really use seeking is if I wanted to jump to a specific position.

Interpreting binary data I just stick to the unpack method provided by the struct module, which makes it easy to define whether you want to interpret a sequence of bytes as an int, float, char, etc. That's how I've been doing it for years so maybe there are more efficient approaches like the from_bytes method described in other answers.

With the struct module you can do things like

struct.unpack("3I", f.read(12))

To read in 3 (unsigned) integers at once. So for example given the format you've reversed engineered I would probably just say

unk, size = struct.unpack("2I", f.read(8))
data = f.read(size)

score 1 · Answer 3 · answered Feb 28 '14 at 23:38

1

You should open the file in binary mode: open(filename, 'rb').

answered Feb 28 '14 at 23:38

Heikki Toivonen

30,964
11
42
44

How to read bytes from file

3 Answers3