10

I have a struct representing a binary message. I want to write a function to get the next such record from a buffer (whether a file or a socket, doesn't matter):

template <typename Record>
Record getNext();

Now, I could write this like:

template <typename Record>
Record getNext() {
    Record r;
    populateNext(reinterpret_cast<char*>(&r),  // maybe ::read()
                 sizeof(r));                   // or equivalent
    return r;
}

which is nice and gives me the benefits of RVO. However, it will invoke the default constructor of Record, which may be composed of types with non-trival default constructors which do work that I would like to avoid - these are not necessarily POD types, but they are standard layout.

Is there a way to write getNext() such that we avoid any constructors (default or copy/move) on Record? Ideally, when the user calls:

auto record = getNext<Record>();

The buffer is read directly into the memory of record. Is this possible?

Barry
  • 286,269
  • 29
  • 621
  • 977
  • 2
    Have you read [this discussion on the UB reflector](http://www.open-std.org/pipermail/ub/2013-September/000127.html) and [the related N3751](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3751.pdf)? There's also [proposal N4393](http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4393.pdf). – dyp Jun 09 '15 at 15:57
  • 1
    Out of curiosity, why are you avoiding passing the record by reference? Implementation requirements? – jaggedSpire Jun 09 '15 at 16:04
  • 2
    Arguably, you need *trivially copyable* to copy from a byte array into an object (to restore its value instead of UB). If the object is trivially copyable, it is possible the compiler elides the copy ctor even if that doesn't fall under the copy elision exception. – dyp Jun 09 '15 at 16:08
  • @dyp You reckon [this](http://coliru.stacked-crooked.com/a/ae4b1e3ea99288f8) would not be fine? – Columbo Jun 09 '15 at 16:08
  • 1
    @Columbo I honestly don't know. Depends on what you mean with *fine*. I know for sure that it is required to work with *trivially copyable* types. There have been discussions that *trivially copyable* is too strict here, and I can't remember all the details. – dyp Jun 09 '15 at 16:09
  • @Columbo Wait, your `populateNext` does nothing. Depends on what the Standard/committee intends to say with *non-vacuous initialization* (it is formulated as a property of a specific initialization rather than a property of a type..) -- The issue I see is binding `Record&&` to an object of incompatible type. E.g. [dcl.init.ref]p1 – dyp Jun 09 '15 at 16:12
  • @dyp I was more on about the implementation of `getNext` - ignore the rest. (Just wanted to provide a compilable demonstration.) Also, yeah, I am aware that it is only guaranteed to work with trivially copyable types, but again - compilable example and such- so the corresponding condition on `enable_if` is commented out. – Columbo Jun 09 '15 at 16:16
  • N4393 was rejected at Lenexa: https://botondballo.wordpress.com/2015/06/05/trip-report-c-standards-meeting-in-lenexa-may-2015/ – dyp Jun 09 '15 at 16:21
  • @Columbo The last message in the discussion on the UB reflector I mentioned above contains an example where one object of one of several types is created by a single `memcpy`. Your example does not contain a `memcpy` or anything that could conjure up an object. One could argue that *the memory for that object has been allocated*, but IMHO the spec is *very* unclear about this (there's that issue about `malloc`, for example) – dyp Jun 09 '15 at 16:25
  • @dyp Imagine that populateNext implies memcpy or the like. This is more a PoC of getNext than an actually valid example. (Should have written that earlier, sorry.) – Columbo Jun 09 '15 at 16:26
  • @dyp Aw, N4393 looks like exactly what I want. – Barry Jun 09 '15 at 16:28
  • I just realized that your `getNext` doesn't fulfil Barry's requirements (it doesn't allow NRVO).. I'd probably move all the type punning into an altered version of `populateNext`, to write `getNext` as `return populateNext(arr);`, where `populateNext` contains some `auto p = reinterpret_cast( memcpy(arr, src, sizeof(T)) ); return std::move(*p);` But I guess this doesn't really improve anything wrt strict aliasing. – dyp Jun 09 '15 at 16:36
  • @Barry Maybe you should ask the author of N4393 or Chandler Carruth about those optimizations mentioned in Botond's trip report. Then post any results here ;) I'm quite unhappy with both the complexity (hard to understand) and the (apparent?) restrictions imposed by the current strict-aliasing and lifetime rules. – dyp Jun 09 '15 at 16:44
  • Can you modify `Record` and the types it includes? How hard would this be? – Yakk - Adam Nevraumont Jun 09 '15 at 17:12
  • @Yakk To be POD and not just standard-layout? It's probably possible, but would like to avoid having to do that. – Barry Jun 09 '15 at 17:24
  • 1
    @dyp Pablo hasn't heard anything from Chandler about disk-swizzling, his ideas were only about destructive-move optimizations. He says noop constructors is dead and he has no intention in reviving the topic. – Barry Jun 09 '15 at 19:34
  • @Barry it would help to have a concrete example of what members`Record` might have. You weren't exactly clear on what `populateNext` does but it sounds like you want to "construct" non-trivial member variables of `Record` by copying bytes into them , without calling their constructors. – M.M Jun 10 '15 at 23:15
  • @MattMcNabb Yes that's exactly right. A `Record` is made up of types that are either PODs, or maybe a class that has and `int` or a `char[N]` that is zeroed out in its default constructor - it's that initialization I want to avoid in this case. – Barry Jun 11 '15 at 00:48

2 Answers2

5

no_init is a constant of type no_init_t.

If you construct a pod from a no_init_t, you get an uninitialized pod, and (assuming elision) there is nothing to be done.

If you construct a non-pod from a no_init_t, you have to override a constructor, and make it not initialize the data. Usually class_name(no_init_t):field1(no_init), field2(no_init){} will do it, and sometimes class_name(no_init_t){} will do it (assuming all contents are pod).

Constructing from no_init on each member can act as a sanity check that the members are indeed pod, however. A non-pod class constructed from no_init will fail to compile until you write the no_init_t constructor.

This (having to no_init each member constructor) does generate some annoying DRY failure, but we don't got reflection, so you are gonna repeat yourself and like it.

namespace {
  struct no_init_t {
    template<class T, class=std::enable_if_t<std::is_pod<T>{}>>
    operator T()const{
      T tmp;
      return tmp;
    }
    static no_init_t instance() { return {}; }
    no_init_t(no_init_t const&) = default;
  private:
    no_init_t() = default;
  };
  static const no_init = no_init_t::instance();
}


struct Foo {
  char buff[1000];
  size_t hash;
  Foo():Foo(""){}
  template<size_t N, class=std::enable_if_t< (N<=sizeof(buff)) >>
  Foo( char const(&in)[N] ) {
    // some "expensive" copy and hash
  }
  Foo(no_init_t) {} // no initialization!
};
struct Record {
  int x;
  Foo foo;
  Record()=default;
  Record(no_init_t):
    x(no_init), foo(no_init)
  {}
};

Now we can construct Record with no_init and it won't be initialized.

Every POD class is not initialized. Every non-POD class must provide a no_init_t constructor (and presumably implement non-initialization, as best it can).

You then memcpy right over it.

This requires modifying your type, and the types it contains, to support non-initialization.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • This is really interesting. Why did you make `no_init_t` privately constructed like that? – Barry Jun 09 '15 at 21:51
  • @barry you should be using the `no_init` name, not `no_init_t{}`. Basically, mimicing `nullptr_t`. – Yakk - Adam Nevraumont Jun 09 '15 at 23:16
  • I get that - I meant why not just like `static constexpr no_init_t no_init;`? – Barry Jun 09 '15 at 23:20
  • @Barry because then `no_init_t{}` would work. I was mimicing `nullptr_t`, where `nullptr_t{}` doesn't work, you have to use `nullptr`. – Yakk - Adam Nevraumont Jun 10 '15 at 01:23
  • 1
    @Yakk [Are you sure about `nullptr_t{}` not working](http://coliru.stacked-crooked.com/a/6949526c3e9afdbf)? (I also wonder whether it is technically UB to return an uninitialized thing.) – T.C. Jun 10 '15 at 03:50
  • `nullptr` is a scalar type, and I can't find any further restrictions in the Standard. I also agree with @T.C.'s assessment that returning something uninitialized can be UB, e.g. `no_init_t::operator T` for, say, `T == int` [dcl.init]p12 / *indeterminate value* – dyp Jun 10 '15 at 17:16
1

Something like this?

EDIT:

  1. Addresses comment on alignment. Now uses anonymous union to ensure correct alignment.

  2. TestRecord now incorporates another standard layout type egg

  3. Added proof that even though egg has a default constructor, the class is not constructed prior to being filled by populateNextRecord()

I think this is about as fast as it can be isn't it?

#include <iostream>
#include <array>
#include <algorithm>

struct egg {
    egg(int i) : _val(i) {}
    egg() {}
    int _val = 6;    
    friend std::ostream& operator<<(std::ostream& os, const egg& e) {
        return os << e._val; 
    }
};

struct TestRecord {
    egg x;
    double y;
};

void populateNext(uint8_t* first, size_t length)
{
    // do work here
    TestRecord data_source { 10, 100.2 };
    auto source = reinterpret_cast<uint8_t*>(&data_source);
    std::copy(source, source + length, first);
}

template<class Record>
struct RecordProxy
{
    RecordProxy() {}

  uint8_t* data() {
      return _data;
  }

  static constexpr size_t size() {
      return sizeof(Record);
  }

  Record& as_record() {
      return _record;
  }

    union {
        Record _record;
        uint8_t _data[sizeof(Record)];
    };
};


template <typename Record>
RecordProxy<Record> getNext() {
    RecordProxy<Record> r;
    populateNext(r.data(),  // maybe ::read()
                 r.size());                   // or equivalent
    return r;
}

using namespace std;
int main()
{
    RecordProxy<TestRecord> prove_not_initialised;
    auto& r1 = prove_not_initialised.as_record();
    cout << "x = " << r1.x << ", y = " << r1.y << endl;

    auto buffer = getNext<TestRecord>();
    auto& actual_record = buffer.as_record();
    cout << "x = " << actual_record.x << ", y = " << actual_record.y << endl;
   return 0;
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 1
    At least `std::aligned_storage` it! – Yakk - Adam Nevraumont Jun 09 '15 at 18:31
  • @Yakk latest edit addresses alignment comment and adds proof that the constructor is not called. – Richard Hodges Jun 09 '15 at 22:20
  • This causes undefined behaviour - only the last-written member of a union may be read in C++ – M.M Jun 09 '15 at 22:25
  • @MattMcNabb while a strict reading of the standard supports this idea, I think you'll find that the behaviour is predictable since the addresses of all items in the union will be the same, and aligned on the lowest common alignment boundary – Richard Hodges Jun 10 '15 at 06:36
  • @RichardHodges unless the compiler optimizes away all your lovely code because it knows that accessing the wrong member of a union causes undefined behaviour – M.M Jun 10 '15 at 12:05
  • @MattMcNabb I think you're clutching at straws. I'll be convinced if you can post a link to an online compiler where this happens. – Richard Hodges Jun 10 '15 at 12:59
  • 1
    @MattMcNabb Cast `std::addressof(_record)` to `char*` and write to that. I think the aliasing rules explicitly permit such operations? – Yakk - Adam Nevraumont Jun 10 '15 at 17:21
  • @RichardHodges I can link to the C++ standard, that's all that matters. You can't possibly know what all compilers past and future in all possible configuration modes will do. – M.M Jun 10 '15 at 21:01
  • @Yakk The aliasing rules permit that, however if `Record` is not trivially copyable (which it isn't in OP's requirements) then it will not be considered to have been constructed yet. – M.M Jun 10 '15 at 21:02
  • @MattMcNabb The standard rules are exceedingly cryptic and obtuse about when an object comes into existence. Are you certain? – Yakk - Adam Nevraumont Jun 10 '15 at 21:29
  • @Yakk agree that this answer could have the union issue fixed by taking out `_data` (leaving just the anonymous union with `record` in it), and having `data()` return `(uint8_t *)&_record`. (I don't see that aligned_storage is needed, `record` will be correctly aligned) – M.M Jun 10 '15 at 22:57
  • @Yakk also agree that the key issue is whether the `std::copy` causes lifetime to begin for `record`. [basic.life/1] says that `record`'s lifetime begins *when initialization is complete*. There is [some debate](http://stackoverflow.com/questions/30114397/constructing-a-trivially-copyable-object-with-memcpy) about whether trivially-copyable objects can be initialized via `memcpy`, but I think it would be difficult to argue that an object with non-trivial construction can be initialized by any way except its constructor. – M.M Jun 10 '15 at 22:59
  • The OP specifically requires to initialise the buffer with bytes, which of course if not recommended, but that is what he asked for. The implication therefore is that the record type *must* be trivially copyable and support initialising from memcpy. This of course implies no private data, no virtual functions etc. under these circumstances its "safe" (with many caveats) – Richard Hodges Jun 10 '15 at 23:22