1

I am writing an unknown number of structs to a binary file and then reinterpret_cast-ing the bytes back to the struct. I know how to write the bytes.

I am unsure how to iterate over the binary file. I would like to use std::ifstream. At some point I must need to increment a file pointer/index by sizeof(struct) bytes, but the only examples (of reading binary in to structs) I could find online were writing N structs and then reading N structs, they were not looping over the file, incrementing any file index.

Pseudo code of what I would like to achieve is:

std::ifstream file("test.txt", std::ifstream::binary);

const size_t fileLength = file.size();
size_t pos = 0;
while(pos < fileLength)
{
    MyStruct* ms = &(reinterpret_cast<MyStruct&>(&file[pos]));

    // Do whatever with my struct

    pos += sizeof(MyStruct);
}

UPDATE:

My struct is POD

user997112
  • 29,025
  • 43
  • 182
  • 361
  • What about [`std::istream::seekg()`](https://en.cppreference.com/w/cpp/io/basic_istream/seekg)? – Scheff's Cat May 08 '19 at 13:20
  • @Scheff And just pass my pos in to seekg? Would I need istream::read() too? – user997112 May 08 '19 at 13:21
  • No you need to pass `pos * sizeof( MyStruct )` and you read into struct, assuming your struct is POD or alike – Slava May 08 '19 at 13:23
  • 1
    `std::fstream` does not support file mapping, so you have to use `read()` or something similar in any case. – magras May 08 '19 at 13:24
  • 1
    `seekg()` can be used to position the "read head" of stream. It counts in bytes. To read a number of bytes, [`std::istream::read()`](https://en.cppreference.com/w/cpp/io/basic_istream/read) comes in mind. – Scheff's Cat May 08 '19 at 13:24
  • This all has its limitations. It probably will work if file writer and reader is the same program. I would be afraid about packing of `struct` and endianess (not an issue if limited to one platform) but it depends a bit on how paranoid you are. ;-) – Scheff's Cat May 08 '19 at 13:28
  • More about this: [Serialization and Unserialization](https://isocpp.org/wiki/faq/serialization). – Scheff's Cat May 08 '19 at 13:30

1 Answers1

1
#include <fstream>

struct MyStruct{};
int main()
{
    std::ifstream file("test.txt", std::ifstream::binary);
    MyStruct ms;
    //Evaluates to false if anything wrong happened.
    while(file.read(reinterpret_cast<char*>(&ms),sizeof ms))
    {
        // Do whatever with my struct
    }
    if(file.eof())
        ;//Successfully iterated over the whole file
}

Please be sure not to do something like this:

char buffer[sizeof(MyStruct)];
file.read(buffer,sizeof(MyStruct));
//...
MyStruct* myStruct = reinterpret_cast<MyStruct*>(buffer);

It will likely work, but it breaks the aliasing rule and is undefined behaviour. If you truly need a buffer ( e.g. for small files it might be faster to read the whole file into the memory first and then iterate over that buffer) then the correct way is:

char buffer[sizeof(MyStruct)];
file.read(buffer,sizeof(MyStruct));
//...
MyStruct myStruct;
std::memcpy(&myStruct,buffer,sizeof myStruct);
Quimby
  • 17,735
  • 4
  • 35
  • 55
  • Could you please elaborate on it breaking aliasing? – user997112 May 08 '19 at 13:47
  • @user997112 To my understanding the above code is illegal because `myStruct` does not point to a object of `MyStruct` type and C++ forbids two pointers of different types to point to the same memory. There is a an exception for the type `char*`, which can point to objects of other types. But as far as I am aware, this works only one-way. So `MyStruct*` cannot point to `char` objects. [This answer](https://stackoverflow.com/a/7005988) contains relevant quotes from the standard that I think should support my claim. – Quimby May 08 '19 at 14:03
  • Also [here](https://en.cppreference.com/w/cpp/language/reinterpret_cast) are rules for valid `reinterpret_cast`s (not the standard). In particular the Type Aliasing paragraph states the expections too and `MyStruct` is not one of them. But if anyone with better knowledge can clear this out I would be grateful. – Quimby May 08 '19 at 14:06
  • Thanks! I'm just waiting for a process to finish and then I will try the above. I won't forget to accept your answer later – user997112 May 08 '19 at 14:31
  • I assume that if(file.eof()) isn't actually required? Is that effectively just asserting the while loop didn't break prematurely? – user997112 May 08 '19 at 14:33
  • 1
    @user997112 `read` returns `*this` and calls `this->setstate()`. The `ifstream` is then converted to bool. if the read fails for any reason it sets `failbit` and possibly others. Which make `ifstream` object false -> loop breaks. Checking the `eofbit` means that the loop broke because the file ended -> all structs were successfully read. If the `eofbit` is not set then all structs were **not** read. "Is that effectively just asserting the while loop didn't break prematurely?" Yes. Also this code assumes that the file size is multiple of `sizeof(MyStruct)`. – Quimby May 08 '19 at 14:42