2

I am trying to write a struct to a file and read it back. The code to do so is here:

#include <fstream>
#include <iostream>
#include <cstring>

using namespace std;

struct info {
  int id;
  string name;
};

int main(void) {
  info adam;
  adam.id = 50;
  adam.name = "adam";

  ofstream file("student_info.dat", ios::binary);
  file.write((char*)&adam, sizeof(info));
  file.close();

  info student;
  ifstream file2("student_info.dat", ios::binary);
  file2.read((char*)&student, sizeof(student));
  cout << "ID =" << student.id << " Name = " << student.name << endl;

  file2.close();
  return 0;
}

However I get a strange segmentation fault in the end.

The output is :

ID =50 Name = adam
Segmentation fault (core dumped)

On looking at the core dump, I see that there is something weird happening in the destruction of the struct info.

(gdb) bt
#0  0x00007f035330595c in ?? ()
#1  0x00000000004014d8 in info::~info() () at binio.cc:7
#2  0x00000000004013c9 in main () at binio.cc:21

I suspect that something weird is happening in the string destruction but I am not able to figure out the exact problem. Any help will be great.

I am using gcc 8.2.0.

Blaze
  • 16,736
  • 2
  • 25
  • 44
eager
  • 53
  • 1
  • 5
  • 2
    Possible duplicate of [How to write std::string to file?](https://stackoverflow.com/questions/15388041/how-to-write-stdstring-to-file) – Alan Birtles Mar 15 '19 at 11:08
  • @AlanBirtles The other question reads and writes in separate processes. I know that string object will store a pointer to its content in heap and when read in a separate process, it will crash. However, in my question, reading and writing are happening in the same process. – eager Mar 15 '19 at 11:18
  • Its still undefined behaviour, you're creating 2 objects pointing to the same data and the solution is the same – Alan Birtles Mar 15 '19 at 11:23

3 Answers3

3

You can't serialize/deserialize like that. On this line here:

file2.read((char*)&student, sizeof(student));

You're just writing 1:1 over an instance of info, which includes an std::string. Those aren't just arrays of characters - they dynamically allocate their storage on the heap and manage that using pointers. Thus the string becomes invalid if you overwrite it like that, it's undefined behavior because its pointers aren't pointing to a valid place anymore.

Instead you should save the actual characters, not the string object, and make a new string with that content on load.


Generally, you can do a copy like that with trivial objects. You can test it like this:

std::cout << std::is_trivially_copyable<std::string>::value << '\n';
Blaze
  • 16,736
  • 2
  • 25
  • 44
  • 4
    Interestingly, this works with `clang` 7.0, and `libc++`, probably as the string content fits into their SSO buffer. – lubgr Mar 15 '19 at 11:06
  • 3
    You cannot do that in general, that is right, but you can sometimes, particularly when the struct [`is_trivially_copyable`](https://en.cppreference.com/w/cpp/types/is_trivially_copyable). – jdehesa Mar 15 '19 at 11:07
  • Good suggestion, I added a bit on that to the answer. – Blaze Mar 15 '19 at 11:12
  • @Blaze Since both objects share the same memory space, shouldn't 1:1 writing work? `student` and `adam` will have the same memory content. This is pointed out by the fact that we are getting the correct content values. I am not able to get the correct reason for segmentation fault. – eager Mar 15 '19 at 11:13
  • 2
    @eager You have two strings, both with their own pointers to the heap, and now you set one of them to point to the other's heap memory. At the end of the program, both try to free the same memory, which causes the problem. At least that's one way the undefined behavior could manifest, it really depends on how the specific compiler/setup implemented it under the hood, how big the string is, and possibly more. – Blaze Mar 15 '19 at 11:18
  • 1
    @Blaze As a matter of fact, I tried to figure out the exact object where the fault is occurring. I suspected the double freeing problem but to my surprise `student` object destruction causes the fault whereas `adam` is yet to be destroyed. – eager Mar 15 '19 at 11:22
  • 1
    @lubgr "works" is the *worst possible* symptom of undefined behaviour, especially with things like small strings – Caleth Mar 15 '19 at 11:28
  • 1
    @eager Trying to reason about code that has undefined behavior is not generally meaningful. The compiler assumes your code adheres to the standard when it makes its transformations, and what it produces when you violate those assumptions need not have any relation to what you (or the standard) expect. If you know the compiler internals (or look only at the assembly) you might find the "error", but the debugger may just give you incorrect information. – Max Langhof Mar 15 '19 at 11:58
  • @lubgr The problem with `gcc` is actually also in SSO (not that it would avoid a double free without SSO either). `gcc` just does its SSO tracking differently. See my answer. – Max Langhof Mar 15 '19 at 12:26
3

To add to the accepted answer, because the asker is still confused about "why does it crash on the deletion of the first object?":

Let's look at the diassembly, because it cannot lie, even in the face of an incorrect program that exhibits UB (unlike the debugger).

https://godbolt.org/z/pstZu5

(Take note that rsp - our stack pointer - is never changed aside from the adjustment at the beginning and end of main).

Here is the initialization of adam:

    lea     rax, [rsp+24]
    // ...
    mov     QWORD PTR [rsp+16], 0
    mov     QWORD PTR [rsp+8], rax
    mov     BYTE PTR [rsp+24], 0

It seems [rsp+16] and [rsp+24] hold size and capacity of the string, while [rsp+8] holds the pointer to the internal buffer. That pointer is set up to point into the string object itself.

Then adam.name is overwritten with "adam":

   call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long)

Due to small string optimization, the buffer pointer at [rsp+8] probably still points to the same place (rsp+24) to indicate the string that we have a small buffer and no memory allocation (that's my guess to be clear).

Later on we initialize student much in the same way:

    lea     rax, [rsp+72]
    // ...
    mov     QWORD PTR [rsp+64], 0
    // ...
    mov     QWORD PTR [rsp+56], rax
    mov     BYTE PTR [rsp+72], 0

Note how student's buffer pointer points into student to signify a small buffer.

Now you brutally replace the internals of student with those of adam. And suddenly, student's buffer pointer doesn't point to the expected place anymore. Is that a problem?

    mov     rdi, QWORD PTR [rsp+56]
    lea     rax, [rsp+72]
    cmp     rdi, rax
    je      .L90
    call    operator delete(void*)

Yep! If the internal buffer of student points anywhere else than where we initially set it to (rsp+72), it will delete that pointer. At this point we don't know where exactly adam's buffer pointer (that you copied into student) points to, but it's certainly the wrong place. As explained above, "adam" is likely still covered by small string optimization, so adam's buffer pointer was likely in the exact same place as before: rsp+24. Since we copied that into student and it is different from rsp+72, we call delete(rsp+24) - which is in the middle of our own stack. The environment doesn't think that's very funny and you get a segfault right there, in the first deallocation (the second one wouldn't even delete anything because the world would still be fine over there - adam was unharmed by you).


Bottom line: Don't try to outclever the compiler ("it can't segfault because it'll be on the same heap!"). You will lose. Follow the rules of the language and nobody gets hurt. ;)

Side note: This design in gcc might even be intentional. I believe they could just as easily store a nullptr instead of pointing into the string object to denote a small string buffer. But in that case you wouldn't segfault from this malpractice.

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
0

Briefly and thinking conceptually, when adam.name = "adam"; is done, appropriate memory is allocated internally for adam.name.

When file2.read((char*)&student, sizeof(student)); is done, you are writing at memory location i.e at address &student which is not yet allocated properly to accommodate the data which is being read. student.adam doesn't have enough valid memory allocated to it. On doing such read into student object's location is actually causing memory corruption.

sameerkn
  • 2,209
  • 1
  • 12
  • 13