1

I have a binary file that contains both integers and doubles. I want to access that data either in one call (something like: x = np.fromfile(f, dtype=np.int)) or sequentially (value by value). However, NumPy doesn't seem to allow to read from a binary file without specifying a type. Should I convert everything to double, or forget about NumPy?

Edit. Let's say the format of the file is something like this:

int

int int int double double double double double double double

etc.

user3180077
  • 633
  • 1
  • 8
  • 16
  • 1
    Check out the `struct` module. – roadrunner66 Apr 22 '16 at 18:11
  • Why can't you read value-by-value, assuming you know the format of the file and thus know which is an integer and which a double, then fill the numpy array with the promoted type (e.g. all float)? – Cyb3rFly3r Apr 22 '16 at 18:11
  • 1
    "I have a binary file that contains both integers and doubles" - what is the format of this binary file? "Binary file" is not enough information to tell how this data is represented; without more information, we cannot tell how to read this file. – user2357112 Apr 22 '16 at 18:17
  • This example might help. http://stackoverflow.com/questions/14215715/reading-a-binary-file-into-a-struct-in-python – roadrunner66 Apr 22 '16 at 18:20
  • 1
    If you want help with your specific example, consider posting a short example of the file, and what you want as output :) Also see [mcve]. – roadrunner66 Apr 22 '16 at 18:22
  • Is the sequence of numeric types predictable? Periodic? Is there a header? If it's just "binary", how can you know what the _next_ type is, while sequentially reading the values? Also, notice that only because the fractional part of a number is zero (eg. `10.000`) that doesn't necessarily means it's an integer. – heltonbiker Apr 22 '16 at 19:36
  • Okay, your edit is marginally better, but you seem to be expecting a "binary file" to work like writing on a piece of paper, with spaces between numbers, and lines, and all sorts of other formatting and disambiguation built in. There's just a sequence of bytes. You can't even write an *int* to a file without specifying a representation, and you can't read one without knowing the representation either. Unless you build in some sort of disambiguation yourself, you can't tell whether 8 bytes are one 64-bit float or 2 32-bit integers or something else. – user2357112 Apr 22 '16 at 23:07

2 Answers2

0
NumPy doesn't seem to allow to read from a binary file without specifying a type

No programming language I know of pretends to be able to guess the type of raw binary data; and for good reasons. What exactly is the higher level problem you are trying to solve?

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
0

I don't think you'd need numpy for this. The basic Python binary library struct is doing the job. Convert list of tuples given at end into numpy array if so desired.

For sources see https://docs.python.org/2/library/struct.html and @martineau Reading a binary file into a struct in Python

from struct import pack,unpack

with open("foo.bin","wb") as file:
    a=pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
    file.write(a)

with open("foo.bin","r") as file:
    a=unpack("<iiifffffff",file.read() )
    print a

output:

(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)

Showing the binary file in a binary editor (Frhed):

enter image description here

#how to read same structure repeatedly
import struct

fn="foo2.bin"
struct_fmt = '<iiifffffff' 
struct_len = struct.calcsize(struct_fmt)
struct_unpack = struct.Struct(struct_fmt).unpack_from

with open(fn,"wb") as file:
    a=struct.pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
    for i in range(3): 
        file.write(a)


results = []
with open(fn, "rb") as f:
    while True:
        data = f.read(struct_len)
        if not data: break
        s = struct_unpack(data)
        results.append(s)

print results

output:

 [(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)]
Community
  • 1
  • 1
roadrunner66
  • 7,772
  • 4
  • 32
  • 38