-2

I'm serializing data into binary file using ofstream/ifstream. Data is divided in 2 vectors of strings, one for data names and other for data values, std::vector<std::string> dataNames, std::vector<std::string> dataValues.

I'm writting the data using this function:

void Data::SaveData(std::string path)
{
    std::ofstream outfile(path, std::ofstream::binary);
    outfile.write(reinterpret_cast<const char *>(&dataNames[0]), dataNames.size() * sizeof(std::string));
    outfile.write(reinterpret_cast<const char *>(&dataValues[0]), dataValues.size() * sizeof(std::string));
    outfile.close();
}

And reading it using:

bool Data::LoadData(std::string path)
{
    bool ret = false;

    std::ifstream file(path, std::ifstream::in | std::ifstream::binary);
    if (file.is_open())
    {
        // get length of file:
        file.seekg(0, file.end);
        int length = file.tellg();
        file.seekg(0, file.beg);

        char * buffer = new char[length];
        file.read(buffer, length);

        if (file)
        {
            char* cursor = buffer;
            uint32_t bytes = length / 2;
            dataNames.resize(bytes / sizeof(std::string));
            memcpy(dataNames.data(), cursor, bytes);

            cursor += bytes;
            dataValues.resize(bytes / sizeof(std::string));
            memcpy(dataValues.data(), cursor, bytes);

            delete[] buffer;
            buffer = nullptr;
        }

        file.close();
        ret = true;
    }

    return ret;
}

It works. I can write and read it correctly. Except if any of the strings in dataNames or dataValues has 16 chars or more.

Example of data using strings with less than 16 chars:

dataNames[0] = "Type"
dataNames[1] = "GameObjectCount"

dataValues[0] = "Scene"
dataValues[1] = "5"

data 15 chars

Example of data using strings with more than 16 chars:

dataNames[0] = "Type"
dataNames[1] = "GameObjectsCount"   //Added a s. Now have 16 chars

dataValues[0] = "Scene"
dataValues[1] = "5"

data 16 chars

Here you can see that word "GameObjectsCount" doesn't appear and extrange characters are shown. When reading this file the string is not valid. Sometimes it's empty, sometimes says "Error reading characters of string", sometimes is a radom letter...

Any idea?

  • 1
    `sizeof(std::string)` needs to be replaced by `sizeof(char)`. – unxnut Nov 26 '18 at 01:27
  • 1
    A `vector` is not a POD type. A `std::string` is not a POD type. Thus none of the code that looks like this: `outfile.write(reinterpret_cast(&dataNames[0]), dataNames.size() * sizeof(std::string));` will work. To prove this, make one of your strings 1,000 characters. How could `dataNames.size() * sizeof(std::string)` ever be anything close to 1,000? – PaulMcKenzie Nov 26 '18 at 01:30
  • It looks like you are taking the address of a `std::string` and casting it to a `const char*`. That's not going to work. A `std::string` is a bit like a `std::vector`, you need to access its internal array. – Galik Nov 26 '18 at 01:35
  • Also, the data you do see is probably an artifact from Short String Optimization [(SSO)](https://stackoverflow.com/questions/10315041/meaning-of-acronym-sso-in-the-context-of-stdstring/10319672#10319672), where the `std::string` stores its characters in a regular array. Once the string becomes longer than 16 bytes, memory is allocated from the heap to store the string, thus you no longer have the array representing the string, but a pointer to the heap. – PaulMcKenzie Nov 26 '18 at 01:36
  • Does this binary output need to be machine portable? – Galik Nov 26 '18 at 01:44

1 Answers1

0

Reinterpreting a vector in the manner you have above isn't correct.

 outfile.write(reinterpret_cast<const char *>(&dataNames[0]), dataNames.size() * sizeof(std::string));

You don't know how the vector stores data (on the heap, etc..), and you can't assume that you can blindly cast the pointer and write whatever you see out to a file as a method to serialize the data. Furthermore, a std::string isn't necessarily an in-place character array of the size of the input. It's more likely a pointer to an object on the heap.

So, if you want to serialize the data in a vector or another stdlib type, you'll need to write a function to do that manually by iterating over the items and writing them in a properly delimited way.

Paul
  • 370
  • 1
  • 6