1

I have the following class, it contains a data structure called Index, which is expensive to compute. So I am caching the index to disk and reading it in again. The index element id of template type T can be used with a variety of primitive datatypes.

But I would also like to use id with the type std::string. I wrote the serialize/deserilize code for the general case and also tested if it works with normal C++ strings and they work, if they are short enough. Small string optimization seems to kick in.

I also wrote a different implementation just for handling longer strings safely. But the safe code is about 10x slower and I would really like to just read in the strings with fread (500ms readin are very painful, while 50ms are perfectly fine).

How can I reliably use my libcpp small string optimization, if I know that all identifiers are shorter than the longest possible short string? How can I reliably tell how long the longest possible small string is?

template<typename T>
class Reader {
public:
    struct Index {
        T id;
        size_t length;
        // ... values etc
    };

    Index* index;
    size_t indexTableSize;

    void serialize(const char* fileName) {
        FILE *file = fopen(fileName, "w+b");
        if (file == NULL)
            return;

        fwrite(&indexTableSize, sizeof(size_t), 1, file);
        fwrite(index, sizeof(Index), indexTableSize, file);

        fclose(file);
    }

    void deserialize(const char* fileName) {
        FILE *file = fopen(fileName, "rb");
        if (file == NULL)
            return;

        fread(&indexTableSize, sizeof(size_t), 1, file);
        index = new Index[indexTableSize];
        fread(index, sizeof(Index), indexTableSize, file);

        fclose(file);
    }


};

// works perfectly fine
template class Reader<int32_t>;

// works perfectly fine for strings shorter than 22 bytes
template class Reader<std::string>;
Brutos
  • 701
  • 4
  • 15
  • 4
    No. Just no. Don't do it. – Zan Lynx Feb 06 '16 at 01:51
  • 1
    If you must use fread instead of a iostreams function that can write to std::string then make a char buffer[4096] (or whatever biggest size you like), fread into that, then construct a string with `string s(buffer, indexTableSize)` – Zan Lynx Feb 06 '16 at 01:56
  • 1
    You could in principle test it using a custom allocator that throws as soon as it's asked to allocate. Create progressively larger strings in a loop, and catch the exception. In practice though it's possible just easier to look it up for all compilers you want, it's probably almost always 22 characters. – Nir Friedman Feb 06 '16 at 01:57
  • Never write code like that in real life. It'll work fine for six months and then blow up spectacularly when compiled on RHEL 6, or Visual Studio 2018, or a 32 bit or 128 bit system. – Zan Lynx Feb 06 '16 at 01:58
  • Probably would fail on C++/CLI .NET too, since I think std::string might be some kind of shared CLR object to make it easier to pass around to other .NET software. – Zan Lynx Feb 06 '16 at 01:59
  • @Nir I like that idea. I'll try that tomorrow. – Brutos Feb 06 '16 at 02:04
  • I found this article: http://info.prelert.com/blog/cpp-stdstring-implementations which suggests that the allocator trick might not work to fond out the maximum sso string size. Maybe I'll just hardcode these values. I'll try the allocator trick tomorrow, but I would still appreciate other ideas. – Brutos Feb 06 '16 at 02:22
  • @nir: it is only 22 in libc++ (clang). Gnu and windows short strings are shorter. Here's a nice survey by Howard Hinnant: http://stackoverflow.com/a/34377209/1566221 – rici Feb 06 '16 at 03:05
  • Bruno, it's not clear to me from skimming the article why it wouldn't work, can you summarize? – Nir Friedman Feb 06 '16 at 03:19
  • @rici I stand corrected, thank you. I find it surprising as it seems inefficient but maybe they have their reasons. – Nir Friedman Feb 06 '16 at 03:20

2 Answers2

0

std::string is not trivially copyable. And performing memcpy on a type (which is the equivalent of fwriteing it and freading it back) in C++ is only legal if it is trivially copyable. Therefore, what you want to do is not possible directly.

If you want to serialize a string, you must do so manually. You must get the number of characters and write it, then write those characters themselves. To read it back in, you have to read the size of the string, then read that many characters.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
0

If you want to reliably serialize/deserialize with a type T, you have to make sure that your type T is a POD type (or more precisely standard layout and trivial).

You can check this in your template by using std::is_trivially_copyable<T> and std::is_standard_layout<T>. Unfortunately, this will fail for std::string.

If it's not the case, you must find a proper way to serialize/deserialize the class, i.e. write/read the data that permit to reconstruct the state of the object (here, the length of the string, and its content).

Three options:

  • use an auxiliary template that converts T from/to an array of bytes and write a specialisation of this template for each type that may be used for your Reader.
  • use a member function that does this. But this is not possible for std types.
  • use a serialization library, such as for example boost::serialize, s11n or others

I would in any case strongly advise you not to rely on non portable properties, such as the length of short strings, especially if you have this code in a template supposed to work with generic types.

Community
  • 1
  • 1
Christophe
  • 68,716
  • 7
  • 72
  • 138