Is there any difference betwwen writing each and every member of struct to a file and writing structure object directly to a file in c++?

Question

#include <iostream>
#include <fstream>
using namespace std;
struct example
{
 int num1;
 char abc[10];
}obj;

int main () 
{
 ofstream myfile1 , myfile2;
 myfile1.open ("example1.txt");
 myfile2.open ("example2.txt");
 myfile1 << obj.num1<<obj.abc;  //instruction 1
 myfile2.write((char*)&obj, sizeof(obj)); //instruction 2
 myfile1.close();
 myfile2.close();
 return 0;
}

In this example will both the example files be identical with data or different? Are instruction 1 and instruction 2 same?

`In this example will both the example files be identical with data or different?` This is something you can easily test. — DimChtz, Apr 21 '18 at 12:00
Difference - yes. Both examples the same - no. The member by member text ("<<") stream output (of instruction 1) will not reflect padding contained in binary write of instruction 2. And the binary 'write' of instruction 2 will not store readable text of instruction 1. — 2785528, Apr 21 '18 at 12:10
they are not the same. not by a long shot. but i am too lazy to write a good answer — bolov, Apr 21 '18 at 12:11

Tony Delroy · Accepted Answer · 2018-04-21T13:01:57.860

There's a massive difference.

Approach 1) writes the number using ASCII encoding, so there's an ASCII-encoded byte for each digit in the number. For example, the number 28 is encoded as one byte containing ASCII '2' (value 50 decimal, 32 hex) and another for '8' (56 / 0x38). If you look at the file in a program like less you'll be able to see the 2 and the 8 in there as human-readable text. Then << obj.abc writes the characters in abc up until (but excluding) the first NUL (0-value byte): if there's no NUL you run off the end of the buffer and have undefined behaviour: your program may or may not crash, it may print nothing or garbage, all bets are off. If your file is in text mode, it might translate any newline and/or carriage return characters in abc1 to some other standard representation of line breaks your operating system uses (e.g. it might automatically place a carriage return after every newline you write, or remove carriage returns that were in abc1).

Approach 2) writes the sizeof(obj) bytes in memory: that's a constant number of bytes regardless of their content. The number will be stored in binary, so a program like less won't show you the human-readable number from num1.
Depending on the way your CPU stores numbers in memory, you might have the bytes in the number stored in different orders in the file (something called endianness). There'll then always be 10 characters from abc1 even if there's a NUL in there somewhere. Writing out binary blocks like this is normally substantially faster than converting number to ASCII text and the computer having to worry about if/where there are NULs. Not that you normally have to care, but not all the bytes written necessarily contribute to the logical value of obj: some may be padding.

A more subtle difference is that for approach 1) there are ostensibly multiple object states that could produce the same output. Consider {123, "45"} and {12345, ""} -> either way you'd print "12345". So, you couldn't later open and read from the file and be sure to set num1 and abc to what they used to be. I say "ostensibly" above because you might happen to have some knowledge we don't, such as that the abc1 field will always start with a letter. Another problem is knowing where abc1 finishes, as its length can vary. If these issues are relevant to your actual use (e.g. abc1 could start with a digit), you could for example write << obj.num1 << ' ' << obj.abc1 << '\n' so the space and newline would tell you where the fields end (assuming abc1 won't contain newlines: if it could, consider another delimiter character or an escaping/quoting convention). With the space/newline delimiters, you can read the file back by changing the type of abc1 to std::string to protect against overruns by corrupt or tampered-with files, then using if (inputStream >> obj.num1 && getline(inputStream, obj.abc1)) ...process obj.... getline can cope with embedded spaces and will read until a newline.

Example: {258, "hello\0\0\0\0\0"} on a little-endian system where sizeof(int) is 32 and the stucture's padded out to 12 bytes would print (offsets and byte values shown in hexadecimal):

              bytes in file at offset...
              00  01  02  03  04  05  06  07  08  09  0a  0b  0c  0d  0e  0f

approach 1)   32  35  38  69  65  6c  6c  6f
              '2' '5' '8' 'h' 'e' 'l' 'l' 'o'

approach 2)   00  00  01  02  69  65  6c  6c  6f  00  00  00  00  00  00  00
              [-32 bit 258-]  'h' 'e' 'l' 'l' 'o''\0''\0''\0''\0''\0' pad pad

Notes: for approach 2, 00 00 01 02 encodes 100000010 binary which is 258 decimal. (Search for "binary encoding" to learn more about this).

you might add about endianess and padding. Also about the `\n` `\r` debacle — bolov, Apr 21 '18 at 12:14
What a depressing comment ;-). I made a half-arsed go of it. Cheers. — Tony Delroy, Apr 21 '18 at 12:20
just a note: `sizeof(obj)` can be different (and very likely will) from `sizeof(int)+10`, due to the way compiler lay outs the struct in memory (alignment and maybe something else?) see [this](https://onlinegdb.com/H1Web3unz) as example. But everything else hit the point IMHO. — Gian Paolo, Apr 21 '18 at 12:40

Is there any difference betwwen writing each and every member of struct to a file and writing structure object directly to a file in c++?

1 Answers1

Linked