Google Sparsehash uses realloc() on type which is not trivially copyable

Question

Consider this simple program:

#include <string>
#include <sparsehash/dense_hash_map>

int main()
{
    google::dense_hash_map<std::string, int> map;
    map["foo"] = 0;
}

Compiling with GCC 8.2 and -Wclass-memaccess (or -Wall) produces a warning:

sparsehash/internal/libc_allocator_with_realloc.h:68:40: warning:
‘void* realloc(void*, size_t)’ moving an object of non-trivially copyable type
    ‘struct std::pair<const std::__cxx11::basic_string<char>, int>’;
use ‘new’ and ‘delete’ instead [-Wclass-memaccess]
    return static_cast<pointer>(realloc(p, n * sizeof(value_type)));
                                ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

The questions are:

Is it undefined behavior?
Can you suggest a fix or workaround which can be applied to the application code (not by changing Sparsehash or avoiding its use)?
(Bonus points) can you construct a program which actually misbehaves due to this (using std::string or your own nontrivial type)? So far I haven't seen any problems in code using std::string as the key type, despite the fact that std::string must be quite a commonly used key type.

I've filed an issue here: https://github.com/sparsehash/sparsehash/issues/149

I can't recreate this using either GCC 8.1 or clang (using `-Weverything`) using sparsehash 2.0.3. What sparsehash version are you using? — Yuushi, Sep 21 '18 at 05:20
@Yuushi: I'm using GCC 8.2 on 64-bit Linux (just edited the question from "GCC 8" to "GCC 8.2" in case it matters). I tried Sparsehash 2.0.2 and trunk (which should be newer than 2.0.3); both had the warning. — John Zwinck, Sep 21 '18 at 05:52
@Yuushi: I've now tried with Sparsehash 2.0.3 and it produces the same warning. In fact, I built it from source and GCC 8.2 issues similar warnings about `class-memaccess` while building Sparsehash itself. — John Zwinck, Sep 21 '18 at 06:02
From `dense_hash_map` documentation: _"`Alloc` The STL allocator to use. By default, uses the provided `allocator libc_allocator_with_realloc`, which likely gives better performance than other STL allocators due to its built-in support for `realloc`, which this container takes advantage of._" This kind-of seems that for (at least) _non-triviallycopyable_ types one should provide another allocator to `dense_hash_map`. If fact, I don't think that `realloc` can be used with any types according to the standard, since it does not _construct_ objects in the allocated memory. — Daniel Langr, Sep 21 '18 at 06:34
@DanielLangr: That's helpful, thanks. The mysteries to me right now are (1) why such a seemingly major limitation is not called out in the comments or docs, and (2) why even building Sparsehash itself causes these sorts of warnings (making it seem like Sparsehash has UB even before you actually use it in an application). — John Zwinck, Sep 21 '18 at 08:38
@JohnZwinck I don't have the answer, however, it seems to me that the whole project is a bit outdated. For instance, `dense_hash_map` and its representation `dense_hashtable` even do not have move constructors. There is some effort to make C++11 versions, but don't know about their current status: https://github.com/sparsehash/sparsehash-c11. And, my quotation was from `dense_hash_map` docs, so it's not true that this limitation is not mentioned there. — Daniel Langr, Sep 21 '18 at 10:22
Also note that before C++11, there were no type traits / static assert to, for instance, check trivial copyability of types (template arguments here) at compile time. — Daniel Langr, Sep 21 '18 at 10:37
@DanielLangr: Neither C89, nor any C or C++ Standard since, has made any real attempt to avoid characterizing useful (and sometimes necessary) actions as UB. Instead, the authors of C89 assumed that in situations where there was one obvious useful way for an implementation to behave, and they could see no reason that a non-capricious implementation might do otherwise, implementations would behave in useful fashion *without regard for whether the Standard actually required them to do so*, and there was thus no need to exhaustively characterize such situations. — supercat, Sep 21 '18 at 15:43
@DanielLangr: Traditionally, certain actions would cause the system to behave as though some objects had been "magically" constructed, without the Standard having to list all such cases. The language would have been unworkable for many purposes if implementations hadn't supported such means, but since they weren't explicitly listed they got omitted from later versions of the list. I would think that `realloc` should probably be regarded as a `malloc`/`memcpy`/`free` sequence and thus construct any objects that could be constructed via `memcpy`, but I wouldn't be surprised if the Standard... — supercat, Sep 21 '18 at 15:55
@supercat I see your point. Just note that `memcpy` cannot _construct_ any object, it can only copy byte representations of objects. That is, after `int i = 1; char* ptr = new char[sizeof(int)]; memcpy(ptr, &i, sizeof(int));`, there has been no `int` object constructed in the memory addressed by `ptr`. If we need an `int` object there, we need to use placement new. — Daniel Langr, Sep 21 '18 at 16:00
@DanielLangr: What is described by the term "trivially-copyable" types, if not types of objects that are implicitly constructed at the destination of a `memcpy`? — supercat, Sep 21 '18 at 16:50
@supercat Types whose byte representations you can `memcpy` between existing objects. By the Standard. — Daniel Langr, Sep 21 '18 at 17:19
@DanielLangr: N4713 6.6.3 is unclear. It says that unless an *object* has "non-vacuous initialization", its lifetime begins when storage is received. If NVI were a trait of types, that could be interpreted as meaning "unless an object *would be of a type requiring NVI*"...) but it seems like NVI is a property of instances. In any case, while 6.6.3 fails to adequately say what it means for an *object* to have NVI, I've seen nothing to suggest that programs should waste time default-initializing objects which are later going to get completely overwritten with a sequence of bytes from elsewhere. — supercat, Sep 21 '18 at 18:53
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/180549/discussion-between-daniel-langr-and-supercat). — Daniel Langr, Sep 21 '18 at 19:47

SJHowe · Answer 1 · 2018-09-25T20:25:29.263

1. Is it undefined behavior? Yes. You should never copy objects using realloc(), because sometimes they have internal pointers that point to a resource. The problem comes later when 2 different objects have their destructors run. Now a double deallocation occurs of the same resource, a complete no no.

2. Can you suggest a fix or workaround which can be applied to the application code (not by changing Sparsehash or avoiding its use)?

Try

#include <memory>

and change line

google::dense_hash_map<std::string, int> map;

to

google::dense_hash_map<std::string, int, std::hash<std::string>, std::equal_to<std::string>, std::allocator> map;

Now, it won't use google's allocator libc_allocator_with_realloc

3. (Bonus points) can you construct a program which actually misbehaves due to this (using std::string or your own nontrivial type)? So far I haven't seen any problems in code using std::string as the key type, despite the fact that std::string must be quite a commonly used key type.

Not easily. Because you are trying to cause undefined behavour. In your test program, I would feed strings that are at least 32 characters in length, so the small string optimisation does not kick in. And there are tests that can be done in gcc's heap to see if it has gone corrupt. See 1

score 2 · Accepted Answer · answered Sep 25 '18 at 20:03

Yes, it is undefined behavior.
But don't despair yet, iff std::string doesn't store any internal pointers in your implementation, nor registers them anywhere, it will "work" anyway; making a bitwise copy will be equivalent to move-construction at the destination and destructing the source.
Such is the case for most (not all) string-implementations, SSO or not.
If you might use a type not guaranteed to be trivially destructively-moveable, use a different allocator (last template-argument) to avoid bitwise-moves.

Making a program blow up due to invalid moveing by bitwise copy is trivial.
Use this type with google::dense_hash_map:

class bang {
    bang* p;
public:
    bang() : p(this) {}
    bang(bang const&) : bang() {}
    bang& operator=(bang const&) { return *this; }
    ~bang() { if (p != this) std::abort(); }
};

score 0 · Answer 3 · answered Sep 27 '18 at 17:51

0

I suppose this code anticipates the maybe c++20 class attribute trivially relocatable. In essence, this is an object which memory location can safely be changed. In c++ parlance, this is an object that can safely be copied by coping the object representation and the program keeps the expected behavior as long as the copied object is not accessed any more, not even for destruction.

For example, this code might not be specified as "undefined behavior" by the C++20 standard:

alignas(string) unsigned char buffer[sizeof(string)];
auto p = new(buffer) string{"test"};
alignas(string) unsigned char buffer2[sizeof(string)];
memcpy(buffer,buffer2,sizeof(string));//create a new string object copy of *p;
auto np = reinterpret_cast<string*>(buffer2);
(*np)[0]="r";
// the object at p shall not be accessed, not even destroyed.

A type should not be trivially relocatable if it has a non static data member that refers to any part of itself:

struct fail{
    int a;
    int b;
    int* selected;
    fail(bool is_a,int i){
       if (is_a){ a=i; selected=&a;}
       else { b=i; selected=&b;}
       }
     };

Some implementation of linked list container could also not be trivially relocatable, for example if the container hold a member which is the root node. So dense_hash_map shall just not be used with those kind of self memory referencing types.

answered Sep 27 '18 at 17:51

Oliv

17,610
1
29
72

1

The C and C++ languages have long been caught in a Catch-22: they don't want to add new definitions for constructs which have been in use for decades without need for an official definition, but some compiler writers insist that constructs that aren't "officially" defined aren't really part of the language. The notion that some types of objects can be safely relocated and others can't goes back to 1986; if the language survived three decades without it, why should the Committee recognize that it's needed "now"? – supercat Sep 27 '18 at 19:29
@supercat They could decide to make the standard c++ specification a representation of the standard behavior ;)! Or not! – Oliv Sep 27 '18 at 22:26
The Committee likely thought the only time it would matter whether the Standard actually mandated something would be in cases where an implementer had a good reason for wanting to do something different; even in those cases the implementer's judgment would likely be better than that of a Committee with no idea what reasons an implementer might have for doing something unusual. They have no diplomatic way of handling the case where an implementation does something silly for no reason beyond the fact that the Standard didn't mandate the common behavior. – supercat Sep 27 '18 at 22:33
@supercat Sentence as this one [expr.rel] *If both operands (after conversions) are of arithmetic or enumeration type, each of the operators shall yield true if the specified relationship is true and false if it is false.* showes that the standard is also the result of an intention to formalize common/standard implementation (non silly one at least). But formalism always demonstrates their limit when applied to reality. The same happen in countries with a tradition of written law. In those countries, unscrupulous can follow the written law while they violate the spirit of the law. – Oliv Sep 28 '18 at 15:58
@supercat The other draw back of formalism is that they tend to proscribe any evolution, because they always reach a point of complexity where any tiny modification as large impact on many part of the specification. The committee shall have tried to express the spirit of the language more than trying to formalize all corner cases. Whatsoever, the initial concepts behind the design of C and Algol were immature, and C++ was an attempt to go further. With lot of effort they advance but this is such a waste, for the committee as it is for the coder. It's time to restart from scratch to be broader. – Oliv Sep 28 '18 at 16:05
A good standard for something as diverse as C or C++ implementations and programs should recognize that there are many things which might in some circumstances be impractical, but which quality implementations and programs should do when practical. If the said, e.g., that implementations should, when practical, guarantee that no integer operation that isn't a divide or remainder will have any side-effect except possibly to signal an error in Implementation-Defined fashion, and a program malfunctions because it expects overflow to be handled that way on an implementation where it wasn't... – supercat Sep 28 '18 at 16:19
...then the implementation could be faulted if it failed to document a good reason for not supporting the behavior, and the programmer could be faulted if the implementation had a good documented reason and the programmer ignored it. If a programmer wouldn't care about whether `x*y>z` yields one or zero in case of overflow, so long it does either, letting the implementation evaluate the expression in any way that yields a correct answer in the absence of overflow, and arbitrarily yields 0 or 1 in case of overflow, would allow more optimizations than requiring a programmer to prevent overflow. – supercat Sep 28 '18 at 16:23
1

@supercat The concept of "quality implementation" is something you are coming back recurrently. I can't find definition on internet for this, but I have read a lot of your comments. The idea is that the standard shall be more "relaxed" and offer opportunities to implementation to show their bests? Something wiser? – Oliv Sep 28 '18 at 16:53
The authors of the Standard use the phrase "quality of implementation" a few times in the rationale, regarding questions such as how to report diagnostics, the complexity of programs an implementation can handle, and the choice of how to act when the Standard imposes no requirements as "quality of implementation" issues. The authors don't specifically say that a quality implementation that behaves contrary to precedent should document the behavior and (good) reasons for it, but I'm not sure why anyone seeking to produce a quality implementation wouldn't document such things anyhow. – supercat Sep 28 '18 at 17:05
Because implementations are intended for a variety of purposes, requiring different judgments as to whether to follow precedents or deviate from them. I would think it should be self-evident, however, that a quality implementation intended to be suitable for any *particular* kind of task, however, should strive to support idioms that programmers performing such tasks often find useful, and that implementations, almost regardless of intended purpose, should document cases where their behavior would deviate from precedent behaviors that programmers might otherwise find useful. – supercat Sep 28 '18 at 17:15

Google Sparsehash uses realloc() on type which is not trivially copyable

3 Answers3