I'm working with a couple of binary files and I want to parse UTF-8 strings that exist.
I currently have a function that takes the starting location of a file, then returns the string found:
def str_extract(file, start, size, delimiter = None, index = None):
file.seek(start)
if (delimiter != None and index != None):
return file.read(size).explode('0x00000000')[index] #incorrect
else:
return file.read(size)
Some strings in the file are separated by 0x00 00 00 00
, is it possible to split these like PHP's explode? I'm new to Python so any pointers on code improvements are welcome.
Sample file:
48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C 00 64 00 | 00 00 00 00 | 31 00 32 00 33 00
which is Hello World123
, I've noted the 00 00 00 00
separator by enclosing it with |
bars.
So:
str_extract(file, 0x00, 0x20, 0x00000000, 0) => 'Hello World'
Similarly:
str_extract(file, 0x00, 0x20, 0x00000000, 1) => '123'