1

I'm working on a project that I need to first read data from a file, then make some change to it, and then save it to another file (all in binary mode).

For reading, my first try was to open the file with ifstream and read directly from the file with read(), but because I need to read small bytes from the file back to back, I think it's not a good idea to keep reading data directly from the file itself. I mean, currently I'm doing it this way for reading the file into a structure and normal variables:

namespace DBinary {
    #pragma pack(push, 1)
    struct Structure
    {
        int32_t iData1;
        int16_t iData2;
        int16_t iData3;
        int16_t iData4a;
        int16_t iData4b;
        int32_t iData4c;
    };
    #pragma pack(pop)
}

int main()
{
    std::ifstream input(path, std::ios::binary);

    //for reading structure
    DBinary::Structure tstruc{};
    file.read((char*)&tstruc, sizeof(DBinary::Structure));

    //read single value
    uint16_t anint = 0;
    core_file.read((char*)&anint, sizeof(anint));
}

It's OK, but I think I can do it better, because the file isn't that big. Maybe I can read it fully into memory and then work on it? But I'm not sure what is the best way to do that, and how to do that, because I don't have much experience in C++ and I'm new to it.

I also want to be able to freely edit and change the data that I read from files, so its important for me to also support that.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
file-tracer
  • 329
  • 1
  • 7
  • 3
    The most efficient method to write data to a file is write as much data per transaction that you can. For example, writing 32 variables in one transaction is much more efficient than writing 32 transactions of one variable. – Thomas Matthews Aug 05 '21 at 21:54
  • 4
    You should order the data members by decreasing size. This will assist the compiler with alignment and reduce the quantity of padding bytes. There is no need for the `pack` pragma. – Thomas Matthews Aug 05 '21 at 21:56
  • 2
    You can read the whole file into memory, why not? What is the actual question here? – SergeyA Aug 05 '21 at 21:56
  • 2
    Search the internet for memory mapping with your operating system, e.g. "C++ Windows 10 memory mapping". – Thomas Matthews Aug 05 '21 at 21:57
  • 2
    @ThomasMatthews the pragma comment seems to be off-base. Nothing indicates that it is OP who have chosen binary format, so suggestions to change it are probably not appropriate. – SergeyA Aug 05 '21 at 21:57
  • 2
    I would suggest using `packed` attribute to the struct rather than `#pragma`. – SergeyA Aug 05 '21 at 21:58
  • 2
    @ThomasMatthews it is not clear right now if OP even needs mmaped file. Perhaps they just need to read objects into an array? – SergeyA Aug 05 '21 at 21:58
  • 2
    That said, there are often performance implications when using a packed structure. If you want efficient runtime speed, not efficient writing of code or efficient usage of storage packing may not be for you. Profile and find out. – user4581301 Aug 05 '21 at 22:01
  • 1
    @SergeyA the reason is I don't think reading all data directly from file itself is a good Decision, for example I have a 100MB file and I want to parse it until end in a while loop, this mean the program read the file from disk so many times that maybe lead to slow down in some case. so I'm thinking maybe read the file in memory is better, and also beside of that I need to be able to read part of the data after reading it and then create new file based of that (that mean I cant just start writing to the file, I need to first create my data and edit them then write them) – file-tracer Aug 05 '21 at 22:15
  • 1
    and about the reason of using the `pack` I cant change the order of data, because as I said Im reading it from a file with a known structure, so I have to use `pack pragma` or `packed` attribute – file-tracer Aug 05 '21 at 22:51
  • 2
    Since you're dealing with a largish file which you probably don't want to fully read into memory you can memory map the file and then work on sections of the file at a time. What `OS` are you developing for? – WBuck Aug 06 '21 at 00:41
  • 2
    Partly related to a question I asked years ago: [What goes on behind the curtains during disk I/O](https://stackoverflow.com/q/13171052/1553090) where I profiled many random writes using C functions and compared with buffered / unbuffered routines. It could be worth a read, since the C++ functions will by default be doing buffered I/O unless you explicitly tell them not to. – paddy Aug 06 '21 at 00:54
  • 1
    @WBuck my Target OS is window, so I can use this method to read and made change to file and save the larger file back to disk? – file-tracer Aug 06 '21 at 09:55
  • 2
    Yes that’s correct. Have you worked with memory mapped files/views in Windows before? – WBuck Aug 06 '21 at 11:22
  • 1
    @WBuck sadly no, also I cant find good example that dont use boost, my main problem is the possibility to edit data that I read (its not in order) and write them back to disk. – file-tracer Aug 06 '21 at 11:26
  • 1
    I also read this https://lemire.me/blog/2012/06/26/which-is-fastest-read-fread-ifstream-or-mmap/ that said using mmap for reading file is not always stable and can cause some problem some times – file-tracer Aug 06 '21 at 11:35
  • 2
    @file-tracer Yeah but `mmap` isn't available on Windows. You will need to use [CreateFileMapping](https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-createfilemappingw) and [MapViewOfFile](https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-mapviewoffile) – WBuck Aug 06 '21 at 11:58
  • 1
    @WBuck after some test I find out I can still use ifstream for reading in most case without much problem, I mean its still fast, but i will try to use `CreateFileMapping`, and as I said my real problem is how can I store all my changed data to another buffer and save it to disk at the end – file-tracer Aug 06 '21 at 13:15
  • 1
    @SergeyA packed is gcc-only while `#pragma pack` understood by all – Алексей Неудачин Aug 07 '21 at 06:44
  • 1
    @ThomasMatthews if you remove `#pragma pack` you'll have no `memcpy` from file data then. – Алексей Неудачин Aug 07 '21 at 06:49
  • @АлексейНеудачин certainly not gcc-only. packed attribute is recognized by gcc, clang and icc. – SergeyA Aug 09 '21 at 13:07

1 Answers1

1

i prefer this

    std::fstream fa("/etc/passwd",std::ios_base::in|std::ios_base::binary);
    std::stringstream mj;
    fa>>mj.rdbuf();    

then you have all stuff in mj.str()