How do I read doubles from a binary file in a loop?

Question

Simple question, but I don't find a helpful answer on the web. I have file create with C++, where I first output a std::size_t k and then write 2 * x doubles.

I need to first read the std::size_t k in python and then iterate in a loop from 0 to k - 1, read two double x, y in each iteration and do something with them:

with open('file', 'r') as f:
    fig, ax = pyplot.subplots()
    k = numpy.fromfile(f, numpy.uint64)[0] # does not work
    for j in range(0, k):
        # get double x and y somehow
        x = numpy.fromfile(f, numpy.double)[0]
        y = numpy.fromfile(f, numpy.double)[0]
        ax.scatter(x = x, y = y, c = 0)
        ax.set_xlim(0, 1)
        ax.set_ylim(0, 1)

The value I read in k is 3832614067495317556, but it should be 4096. And at the point where I read x, I immediately get an index out of range exception.

Do you have C++ code that you can share that created the input file? That might make it easier to reproduce your problem. — 9769953, May 11 '23 at 12:01
Note that the consecutive reads won't really work: `fromfile` reads the file as an array. You'll need `count=1` to read at least item by item (or `count=2`. And use `offset=8`; you can probably read all doubles in one go, with that; and only then iterate over them. — 9769953, May 11 '23 at 12:04
I assume you've double checked that `std::size_t` is equal to `uint64` for your machine? — 9769953, May 11 '23 at 12:06
Depending on your platform/OS, open the file in binary mode: `with open('file', 'rb') as f:`. In particular on Windows, this makes a difference. On any Unix-style OS, as far as I'm aware, this doesn't make a difference (so it doesn't hurt either, and using 'rb' is then better for portability and clarity reasons). — 9769953, May 11 '23 at 12:11
@9769953 It is simply `std::ofstream out("file", std::ios_base::binary); std::size_t k = 4096; out << k`. And then I write the `double`s. And yes, `std::size_t` has 8 bytes on my platform. — 0xbadf00d, May 11 '23 at 12:12
Related: https://stackoverflow.com/questions/14767857/unexpected-results-with-stdofstream-binary-write — 9769953, May 11 '23 at 12:38
Look at the hexdump of the output file; you would find it's definitely not 4096 in the first 8 bytes. Which indicates the C++ is wrong, not the Python code (though that also has mistakes). — 9769953, May 11 '23 at 12:39

9769953 · Accepted Answer · 2023-05-13T09:31:27.837

Your C++ code is wrong. The standard << operator doesn't behave well with binary data.

Use the following:

#include <fstream>

int
main()
        std::ofstream out("file.dat", std::ios::binary);
        std::size_t k = 4096;
        out.write(reinterpret_cast<char*>(&k), sizeof k);

        double a = 1.1;
        for (int i = 0; i < 8; ++i) {
                auto b = a*i;
                out.write(reinterpret_cast<char*>(&b), sizeof b);
        }
}

See the example at the bottom of https://en.cppreference.com/w/cpp/io/basic_ofstream, where I grabbed this from.

(Answer related to improvements and mistakes of the Python code, even if that's not the essential problem.)

Open the file in binary mode (necessary on Windows), read just 1 item for the size, and read all the remaining doubles in one go using the offset parameter. Also verify that std::size_t is equal to uint64 for the relevant machine(s).

import numpy as np

with open('file.dat', 'rb') as f:
    fig, ax = pyplot.subplots()
    # count in items, offset in bytes
    k = np.fromfile(f, np.uint64, count=1)[0]
    # Might need to move the file pointer back to the start
    f.seek(0)
    xy = np.fromfile(f, np.double, offset=8)
    for x, y in zip(xy[::2], xy[1::2]):
        ax.scatter(x = x, y = y, c = 0)
        ax.set_xlim(0, 1)
        ax.set_ylim(0, 1)

If you're not doing anything special with the scatter plots, and just want to plot all data in one scatter plot, you don't need a for-loop:

import numpy as np

with open('file.dat', 'rb') as f:
    fig, ax = pyplot.subplots()
    # count in items, offset in bytes
    k = np.fromfile(f, np.uint64, count=1)[0]
    # Might need to move the file pointer back to the start
    f.seek(0)
    xy = np.fromfile(f, np.double, offset=8)
    ax.scatter(x = xy[::2], y = xy[1::2], c = np.zeros(len(xy)/2))
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)

Thank you for your answer. In your last code, `c = 0` does not work. It seems to expect an array of the same dimension as `x` and `y`. How can we fix this? — 0xbadf00d, May 12 '23 at 16:51
@0xbadf00d Good point, I overlooked that. I think `c = np.zeros(len(xy)/2)` can fix that (untested); see also my updated answer. — 9769953, May 13 '23 at 09:32

How do I read doubles from a binary file in a loop?

1 Answers1