0

I've been involved in porting an older Win32-MFC project to Linux over the last year and a half and finally hit something I don't fully understand. At first I thought it might be due to the introduction of C++11 move semantics, but I'm not sure if that's the issue. Under g++ 4.8.4 using the -std=c++11 flag the following code:

#include <map>
#include <string>
#include <iostream>
#include <iomanip>
#include <cstring>

const char* foo[] = { "biz", "baz", "bar", "foo", "yin" };
const int sizes[] = { 3, 3, 3, 3, 3 };

typedef std::map <std::string, int> simpleMap_t;
typedef std::pair<std::string, int> simplePair_t;

int main()
{
    simpleMap_t map;
    std::string key;
    for (int i = 0; i<5; i++)
    {
        key.resize(sizes[i]);
        memcpy(const_cast<char *>(key.data()), foo[i], sizes[i]);
        simplePair_t pair = std::make_pair(key, 0);
        std::cout << "key: \""         << key        << "\" - " << static_cast<const void*>(key.data())
                  << " pair.first: \"" << pair.first << "\" - " << static_cast<const void*>(pair.first.data())
                  << std::endl;
        map.insert(map.end(), pair);
    }

    std::cout << "map size =  " << map.size() << std::endl;
    return 0;
}

Will produce this output:

key: "biz" - 0x1dec028 pair.first: "biz" - 0x1dec028
key: "baz" - 0x1dec028 pair.first: "baz" - 0x1dec028
key: "bar" - 0x1dec028 pair.first: "bar" - 0x1dec028
key: "foo" - 0x1dec028 pair.first: "foo" - 0x1dec028
key: "yin" - 0x1dec028 pair.first: "yin" - 0x1dec028
map size =  1

While the same code compiled in Visual Studio 2013 will produce this:

key: "biz" - 0039FE14 pair.first: "biz" - 0039FDE0
key: "baz" - 0039FE14 pair.first: "baz" - 0039FDE0
key: "bar" - 0039FE14 pair.first: "bar" - 0039FDE0
key: "foo" - 0039FE14 pair.first: "foo" - 0039FDE0
key: "yin" - 0039FE14 pair.first: "yin" - 0039FDE0
map size =  5

Interestingly, the code will "work" when compiled with g++ when the size of the string changes each iteration. Replacing:

const char* foo[] = { "biz", "baz", "bar", "foo", "yin" };
const int sizes[] = { 3, 3, 3, 3, 3 };

with:

const char* foo[] = { "bizbiz", "baz", "barbar", "foo", "yinyin" };
const int sizes[] = { 6, 3, 6, 3, 6 };

will produce:

key: "bizbiz" - 0xc54028 pair.first: "bizbiz" - 0xc54028
key: "baz" - 0xc54098 pair.first: "baz" - 0xc54098
key: "barbar" - 0xc54108 pair.first: "barbar" - 0xc54108
key: "foo" - 0xc54178 pair.first: "foo" - 0xc54178
key: "yinyin" - 0xc541e8 pair.first: "yinyin" - 0xc541e8
map size =  5

My understanding of move-semantics is incomplete, but I'm left wondering if that's what is at play here. Is ownership of the internal std::string's buffer being given away when making the std::pair? Or is it something else like an optimization in the std::string::resize() method that is not re-allocating a new character buffer when it should be?

AntDok
  • 13
  • 2
  • Huh. Apart from being UB this looks like g++ and it's STL are using COW for `std::string` (`key.data()` and `pair.first.data()` to the same address), but if I recall correctly this is explicitly forbidden? – Daniel Jour Oct 05 '15 at 18:36
  • 3
    Wow! memcpy, casting, manual definition of sizes, manually copy, manually inserting into a map, manually looping over a range which is not related to the arrays... – Klaus Oct 05 '15 at 18:36
  • 1
    @DanielJour http://stackoverflow.com/a/21431742/560648 tl;dr [fixed in GCC 5](http://stackoverflow.com/a/30414284/560648) – Lightness Races in Orbit Oct 05 '15 at 18:37
  • BTW: What should the program really do? What is the key and what should be the content? And what should a insert at the "end" of a sorted map? I believe that the program can be written in a single line without any error and I believe that the intention becomes clearer then! – Klaus Oct 05 '15 at 18:41
  • Not sure about the close reason yet. Asking for UB behaviors is useless (too broad)? – πάντα ῥεῖ Oct 05 '15 at 18:41
  • @πάνταῥεῖ I don't see much reason to close this. – Lightness Races in Orbit Oct 05 '15 at 18:47

1 Answers1

3

Your code has undefined behaviour due to the following bizarre lines of code:

key.resize(sizes[i]);
memcpy(const_cast<char *>(key.data()), foo[i], sizes[i]);

Almost any time you find yourself needing to cast away const-ness (and, for that matter, using memcpy), you are doing something wrong.

Indeed, a cursory glance at documentation for std::string::data (which you read, right? right?) confirms that:

Modifying the character array accessed through data is [sic] undefined behavior.

Is there something wrong with a good old-fashioned assignment?

key.assign(foo[i], sizes[i]);`

Due to the UB, analysing this any further is folly.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • I did read the documentation. My confusion was over why something like this seems to "work" in MSVC rather than in g++. However, I agree that pursuing this further is folly. The original author of the code is long gone and while it worked on a Win32 environment it's certainly something that I'll change for the Linux port. – AntDok Oct 05 '15 at 18:42
  • @AntDok: No it's folly because undefined behaviour leads to unpredictable results, by definition. You can start tearing apart your compiler's source code to find exactly what happened on one particular execution run of your code, but it's very difficult and completely pointless because the next run could be entirely different. Fix the UB then re-test before proceeding! Certainly do not go forth into the night under the impression that this "works" in MSVC or on Windows — it _doesn't_! You are getting unlucky and not seeing immediate symptoms of potentially corrupting your process's memory. – Lightness Races in Orbit Oct 05 '15 at 18:46
  • Unfortunately it has been "working" for over a decade on Windows across three different versions of MSVC. Porting to Linux has laid bare a lot of "sins of the past" committed by my predecessors. This code is in the bowels of our very own home-brewed serialization code. I'm the lucky guy who gets to clean it up. Fortunately there are unit tests in place to verify correct behavior. Thanks again for your insights. They're very much appreciated. – AntDok Oct 05 '15 at 19:02