3

Consider the following code:

#include <memory>

template<typename Allocator>
char* allocate_array(Allocator alloc, size_t n)
{
    static_assert(std::is_same<typename Allocator::value_type, char>::value);
    char* data = alloc.allocate(n);

    // (1)
    for (size_t i=0; i<n; ++i) {
        new (data+i) char();
    }

    return data;
}

char* copy_array(size_t n, const char* original)
{
    char* copy = allocate_array(std::allocator<char>(), n);
    // (2)
    for (size_t i=0; i<n; ++i) {
        copy[i] = original[i];
    }
    return copy;
}

Is the placement-new initialization marked by (1) necessary to prevent the program from having undefined behaviour, even if every char is guaranteed to be written to in (2) before it can be read? Or can it be safely removed?

Note that this is not being optimized away, even at -O3 I see both gcc and clang generating subsequent calls to both memset() and memcpy().

eerorika
  • 232,697
  • 12
  • 197
  • 326
Benno
  • 5,288
  • 5
  • 42
  • 60
  • The advantage of this is your buffer is already null terminated. This lets you skip steps as long as you always maintain the empty part of the string is zero'd. – NathanOliver Oct 16 '19 at 15:47
  • 2
    If you're only concerned of the memset, then you can use `new (data+i) char;` instead. – eerorika Oct 16 '19 at 15:47
  • Please note that "copy_array" can be replaced entirely by the library function strdup() – Owl Oct 16 '19 at 15:51
  • 1
    @Owl Except `strdup` does not support the use of an allocator. – eerorika Oct 16 '19 at 15:52
  • @eerorika I'm mainly concerned about the legality, the optimization part was just an interesting observation (also TIL, thanks for the comment!) – Benno Oct 16 '19 at 15:53
  • https://stackoverflow.com/questions/42294487/managing-trivial-types – Lawrence Oct 16 '19 at 18:15
  • In the first place, who told you you got an array from `alloc.allocate(n)`? – Language Lawyer Oct 16 '19 at 23:10
  • @LanguageLawyer I'm not sure I understand your question? The function `std::allocator::allocate()` is defined to return a pointer to the initial element of an array of storage of size `n*sizeof(T)`, suitably aligned. – Benno Oct 17 '19 at 12:49
  • @Benno OK. But do you know what is "array of storage"? I know only about arrays of _objects_. – Language Lawyer Oct 17 '19 at 20:42
  • @LanguageLawyer Feel free to file a defect report to the standards committee if you think the "array of storage" wording is incorrect. But I don't really see a problem, if you can have "storage" why shouldnt you be able to have an "array of storage". – Benno Oct 18 '19 at 12:23
  • _But I don't really see a problem_ OK. – Language Lawyer Oct 19 '19 at 18:22

1 Answers1

2

Is the placement-new initialization marked by (1) necessary to prevent the program from having undefined behaviour.

Technically yes, as far as I can tell.

[basic.life]

The lifetime of an object of type T begins when:

  • storage with the proper alignment and size for type T is obtained, and
  • its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),

... before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. ... The program has undefined behavior if:

  • the glvalue is used to access the object, or
  • ...

Note that this is not being optimized away

Instead of value initialising the chars, you could default initialise. Then their value would be left indeterminate until the assignment, and thus there should be no need for memset. Also, there is a standard function for this, so there is no need to write the loop:

std::uninitialized_default_construct(data, data + n);
Community
  • 1
  • 1
eerorika
  • 232,697
  • 12
  • 197
  • 326