0

Simple question, but I don't find a helpful answer on the web. I have file create with C++, where I first output a std::size_t k and then write 2 * x doubles.

I need to first read the std::size_t k in python and then iterate in a loop from 0 to k - 1, read two double x, y in each iteration and do something with them:

with open('file', 'r') as f:
    fig, ax = pyplot.subplots()
    k = numpy.fromfile(f, numpy.uint64)[0] # does not work
    for j in range(0, k):
        # get double x and y somehow
        x = numpy.fromfile(f, numpy.double)[0]
        y = numpy.fromfile(f, numpy.double)[0]
        ax.scatter(x = x, y = y, c = 0)
        ax.set_xlim(0, 1)
        ax.set_ylim(0, 1)

The value I read in k is 3832614067495317556, but it should be 4096. And at the point where I read x, I immediately get an index out of range exception.

0xbadf00d
  • 17,405
  • 15
  • 67
  • 107
  • Do you have C++ code that you can share that created the input file? That might make it easier to reproduce your problem. – 9769953 May 11 '23 at 12:01
  • 1
    Note that the consecutive reads won't really work: `fromfile` reads the file as an array. You'll need `count=1` to read at least item by item (or `count=2`. And use `offset=8`; you can probably read all doubles in one go, with that; and only then iterate over them. – 9769953 May 11 '23 at 12:04
  • I assume you've double checked that `std::size_t` is equal to `uint64` for your machine? – 9769953 May 11 '23 at 12:06
  • Depending on your platform/OS, open the file in binary mode: `with open('file', 'rb') as f:`. In particular on Windows, this makes a difference. On any Unix-style OS, as far as I'm aware, this doesn't make a difference (so it doesn't hurt either, and using 'rb' is then better for portability and clarity reasons). – 9769953 May 11 '23 at 12:11
  • @9769953 It is simply `std::ofstream out("file", std::ios_base::binary); std::size_t k = 4096; out << k`. And then I write the `double`s. And yes, `std::size_t` has 8 bytes on my platform. – 0xbadf00d May 11 '23 at 12:12
  • Related: https://stackoverflow.com/questions/14767857/unexpected-results-with-stdofstream-binary-write – 9769953 May 11 '23 at 12:38
  • Look at the hexdump of the output file; you would find it's definitely not 4096 in the first 8 bytes. Which indicates the C++ is wrong, not the Python code (though that also has mistakes). – 9769953 May 11 '23 at 12:39

1 Answers1

1

Your C++ code is wrong. The standard << operator doesn't behave well with binary data.

Use the following:

#include <fstream>

int
main()
        std::ofstream out("file.dat", std::ios::binary);
        std::size_t k = 4096;
        out.write(reinterpret_cast<char*>(&k), sizeof k);

        double a = 1.1;
        for (int i = 0; i < 8; ++i) {
                auto b = a*i;
                out.write(reinterpret_cast<char*>(&b), sizeof b);
        }
}

See the example at the bottom of https://en.cppreference.com/w/cpp/io/basic_ofstream, where I grabbed this from.


(Answer related to improvements and mistakes of the Python code, even if that's not the essential problem.)

Open the file in binary mode (necessary on Windows), read just 1 item for the size, and read all the remaining doubles in one go using the offset parameter. Also verify that std::size_t is equal to uint64 for the relevant machine(s).

import numpy as np

with open('file.dat', 'rb') as f:
    fig, ax = pyplot.subplots()
    # count in items, offset in bytes
    k = np.fromfile(f, np.uint64, count=1)[0]
    # Might need to move the file pointer back to the start
    f.seek(0)
    xy = np.fromfile(f, np.double, offset=8)
    for x, y in zip(xy[::2], xy[1::2]):
        ax.scatter(x = x, y = y, c = 0)
        ax.set_xlim(0, 1)
        ax.set_ylim(0, 1)

If you're not doing anything special with the scatter plots, and just want to plot all data in one scatter plot, you don't need a for-loop:

import numpy as np

with open('file.dat', 'rb') as f:
    fig, ax = pyplot.subplots()
    # count in items, offset in bytes
    k = np.fromfile(f, np.uint64, count=1)[0]
    # Might need to move the file pointer back to the start
    f.seek(0)
    xy = np.fromfile(f, np.double, offset=8)
    ax.scatter(x = xy[::2], y = xy[1::2], c = np.zeros(len(xy)/2))
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
9769953
  • 10,344
  • 3
  • 26
  • 37
  • Thank you for your answer. In your last code, `c = 0` does not work. It seems to expect an array of the same dimension as `x` and `y`. How can we fix this? – 0xbadf00d May 12 '23 at 16:51
  • @0xbadf00d Good point, I overlooked that. I think `c = np.zeros(len(xy)/2)` can fix that (untested); see also my updated answer. – 9769953 May 13 '23 at 09:32