8

C++11 introduced std::begin(std::valarray&) as well as std::end(std::valarray&).

C++17 introduced std::data() which works with std::vector, std::array, C-style arrays, etc. But why wasn't an overloaded std::data() introduced for std::valarray?

std::valarray is specified to have contiguous storage, which can be accessed by taking the address of a[0] (see Notes).

std::data(std::valarray& a) could have simply been defined to return &(a[0]). Why hasn't this been done? Is it an oversight?

My motivation is that I'm working on a general-purpose serialization library. When it receives contiguous binary number arrays from a source (such as CBOR), it detects if the destination container has an overloaded data(container) function, a container.resize(n) member function, as well as an appropriate value_type (matching primitive number type). The existence of all three makes it possible to efficiently memcpy() the source data directly into the destination container. It would make my life simpler if there was a std::data(std::valarray&) overload. The lack of it isn't a showstopper, but it does make the code more messy.


ADDENDUM: The reason why I want to detect a data function is that it tells me that the destination container is contiguous. If it's contiguous, then I can do an efficient byte copy (via std::memcpy or std::copy doesn't really matter). If it's not contiguous, then I have to unpack each unaligned source array number one at a time and append it to the destination container using push_back, emplace, etc depending on the container type.


ADDENDUM 2: I've decided to use an adaptor and traits approach instead of detecting the presence of a data function. This will make it easier to support non-standard or user-defined container types. My question about why there is no std::data(std::valarray& a) still stands.


ADDENDUM 3: I should have clarified that I need to do this hackery for CBOR typed arrays, which can only be numbers. Furthermore, the numbers in the source buffer are not aligned to element boundaries. I'm aware that the binary data may need endian swapping, and that copying bytes to a floating point type may trigger weird NaN behavior if not treated carefully.

I now regret mentioning my motivation, and should have let the std::data(std::valarray& a) question stand on its own. What a trainwreck this question has become, haha. :-)

Emile Cormier
  • 28,391
  • 15
  • 94
  • 122
  • 6
    "*The existence of both makes it possible to efficiently `memcpy()` the source data directly into the destination container*" - is there a reason why you are not using `std::copy()`/`std::copy_n()` for that purpose, letting the compiler optimize it to a `memcpy` when possible? You don't need access to a per-container `data()` to use `std::copy/_n`, just iterators. – Remy Lebeau Feb 06 '21 at 00:53
  • @RemyLebeau I can't use `std::copy` because the source data is in a raw byte buffer, and can't be guaranteed to be aligned at element boundaries. CBOR typed arrays received in a network buffer is where the data originates from, if you really want to know the gory details. – Emile Cormier Feb 06 '21 at 00:57
  • 2
    I think `std::valarray` isn't really considered part f the `STL` as it doesn't adhere to `STL` philosophy. – Galik Feb 06 '21 at 01:03
  • @EmileCormier If you can copy data using raw pointers, you can copy using iterators. Pointers are valid iterators. `std::copy()` with random-access iterators, like for vector and C arrays, gets optimized into memcpy-equivilent code. Code like `container.resize(n); memcpy(std::data(container), source, n * sizeof(decltype(container)::value_type));` can be rewritten as `container.resize(n); std::copy(source, source+n, std::begin(container));` or `container.resize(n); std::copy_n(source, n, std::begin(container));` – Remy Lebeau Feb 06 '21 at 01:06
  • @Galik Yes, I used that tag in the same way many use STL to mean the standard library. My bad. I'll remove the tag. – Emile Cormier Feb 06 '21 at 01:07
  • 1
    @remy take a buffer of 32 bit int data that is not 4 byte aligned. It is legal to memcpy those bytes to a buffer of 32 byte ints, but not legal to form an int pointer to the unaligned buffer and std copy them. OTOH, putting unswizzled wrong endian IEEE bytes into floats can result in hard trap values. – Yakk - Adam Nevraumont Feb 06 '21 at 01:14
  • @RemyLebeau: It's not possible to `std::copy` from the middle of a `std::vector` to a `std::vector` where `std::vector` is the network buffer and `std::vector` is the destination type. The network buffer contains metadata before the actual float array data starts, and the float array data is not guaranteed to be aligned at a `float` boundary. For technical reasons I don't care to get into, it's not possible for me to receive the metadata separately before receiving the payload into a float-aligned buffer. – Emile Cormier Feb 06 '21 at 01:15
  • @Yakk-AdamNevraumont "OTOH, putting unswizzled wrong endian IEEE bytes into floats can result in hard trap values." Indeed. I asked this question on how best to handle this: https://stackoverflow.com/q/65910802/245265 – Emile Cormier Feb 06 '21 at 01:18
  • 1
    @Galik there's some confusion based on how STL is commonly used. [What's the difference between “STL” and “C++ Standard Library”?](https://stackoverflow.com/q/5205491/5987) – Mark Ransom Feb 06 '21 at 01:21
  • 1
    valarrays are not mentioned in [N4017](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4017.htm), the original proposal. – 1201ProgramAlarm Feb 06 '21 at 01:25
  • @1201ProgramAlarm: Barring any technical reasons why there can't be a `std::data(std::valarray&)`, that's probably the simple answer right there. – Emile Cormier Feb 06 '21 at 01:27
  • @MarkRansom There is plenty of disinformation to be sure. But the `STL` has always *properly* been used to mean the *container* and *algorithms* parts of the Standard Library. The original author of the `STL` (Stepanov), the creator of `C++` (Stroustrup) and every major `C++` author (Meyers, Sutter, etc...) use the term exactly that way. – Galik Feb 06 '21 at 01:36
  • I've thought of a workaround for my use case. I can simply define a `data(std::valarray&)` function (and its `const` variant) within the namespace of my library. By doing `using std::data;` beforehand, the deserialization logic will find, via ADL, the custom `data` function for `std::valarray` in addition to the overloaded `std::data` functions for other containers. – Emile Cormier Feb 06 '21 at 01:45
  • 1
    @EmileCormier "*It's not possible to `std::copy` from the middle of a `std::vector` to a `std::vector`*" - sure, it is, eg: `std::vector buffer; /* fill buffer ... */ std::vector values(n); std::copy_n(reinterpret_cast(&buffer[index]), n * sizeof(float), reinterpret_cast(values.data()));` – Remy Lebeau Feb 06 '21 at 02:04
  • @RemyLebeau Yes, of course you're right when you put it that way. I thought you had meant type punning from `uint8_t` to `float` and then doing the `copy_n` using `float*` pointers. I was puzzled why someone with such a high rep would suggest such a thing, hehe. – Emile Cormier Feb 06 '21 at 02:13
  • @EmileCormier I was tempted to cast from `uint8_t*` to `float*` and copying `float`s, but as stated earlier, that would not be legal. Type punning via `char` is legal, though, for the purpose of copying bytes. – Remy Lebeau Feb 06 '21 at 02:17
  • @RemyLebeau: To answer your original question: It's just that I just prefer `std::memcpy` when copying raw bytes as a matter of personal style. I don't think that stylistic choice pertains to the question of why `std::valarray` doesn't have a `data` function. I'm sure the question of `std::memcpy` vs `std::copy` when dealing with bytes has already been asked here many times and I'll consult them to reconsider my style of using `std::memcpy`. – Emile Cormier Feb 06 '21 at 02:21
  • @RemyLebeau: Also, the reason why I want to detect a `data` function is that it tells me that the destination container is contiguous. If it's contiguous, then I can do an efficient byte copy (via `std::memcpy` or `std::copy` doesn't really matter). If it's not contiguous, then I have to unpack each array value one at a time and append it to the destination container. – Emile Cormier Feb 06 '21 at 02:29
  • @EmileCormier "*If it's not contiguous, then I have to unpack each array value one at a time and append it to the destination container*" - `std::copy/_n()` could be used for that, too – Remy Lebeau Feb 06 '21 at 02:40
  • @RemyLebeau: That would require a temporary element-aligned buffer and I don't want to impose the memory cost for that on the serialization library's users. – Emile Cormier Feb 06 '21 at 02:46
  • 1
    The fact that a container has a `data` function **does not tell everything you need to know** about the possibility of using `memcpy` **safely**. It won't works properly with many classes and is error prone. Many other constraintshould be added like the type should not have a user defined destructor etc. In the past, such types were POD (plain old data). With recent standards, there are much more distinction. – Phil1970 Feb 06 '21 at 03:02
  • It would be much safer without needing to know all the details of the standard by first moving source data so that it is properly aligned and then use `std::copy`. Also, copying data from the network is very fragile as you need to be aware of indianness, floating point format and integer size among other. Once you taking into account all of that, then copy optimisation might not make much difference unless you data contains big arrays of a given type that do not need conversion. **For a general serialization library, I seriously doubt that proper conditions will be met often**. – Phil1970 Feb 06 '21 at 03:09
  • And by the way, in 2021, 99.999% of data exchange should be done in either XML or JSON format that are much easier to handle in many language and understandable by a human. **The fact that a single bug in a serializing library might corrupt data, is a huge advantage for human readable data as a human can fix the data without much knowledge**. – Phil1970 Feb 06 '21 at 03:13
  • @Phil1970: Please cite one example where `data()` on a **standard-compliant** standard library container can't be used as a destination to a `memcpy` operation. If you check the big table at the bottom of https://en.cppreference.com/w/cpp/container , you'll see that only `vector` and `array` support it (as well as `string`), and they are guaranteed to store data contiguously as of C++11. I'm well aware of endianness and floating point formats; this isn't my first rodeo. Finally, I did not ask about which serialization format I should use. – Emile Cormier Feb 06 '21 at 10:48
  • @Phil1970 Not that it matters for this question, but the serialization library I'm working on supports JSON in addition to CBOR, and can be extended for any JSON-like codec. It's up to the user to decide if they prefer performance over human readability. When dealing with large numeric arrays, benchmarks show that CBOR is almost an order of magnitude faster (not surprisingly). – Emile Cormier Feb 06 '21 at 10:59
  • @EmileCormier `data()` cannot be used on `std::vector>`. The fact that data is continuous does not guarantees that it is safe to use `memcpy`. Any type that hold ressource, that count instances or have any side-effect in it copy constructor should not be copied by copying bytes. – Phil1970 Feb 06 '21 at 13:48
  • Why not use an existing CBOR library. There seems to be a lot of them! Already too much choice! – Phil1970 Feb 06 '21 at 14:03
  • @Phil1970 In addition to detecting the existence of `data`, I also check that the destination container's `value_type` is a primitive number type that matches the source. So I would not attempt to deserialize a typed array of numbers into a `vector` - that would be silly. :-) As for why not use an existing library, I've checked the existing ones, and none of them check all the checkboxes in my wishlist. I think I can come up with something better (hopefully). If every library author coming here for help would heed the advise not to bother, then there would be no more innovation. – Emile Cormier Feb 06 '21 at 18:21
  • @Phil1970: I should have clarified that I only need to do this hackery for CBOR Typed Arrays, which can only contain numbers. Sorry for not making this clear in the first place. – Emile Cormier Feb 06 '21 at 18:42
  • @MooingDuck [Why is valarray so slow?](https://stackoverflow.com/q/6850807/995714). But being slow isn't the reason `std::data` doesn't support it – phuclv Feb 08 '21 at 13:25

1 Answers1

2

As 1201ProgramAlarm stated in the comments, the proposal to add std::data does not make any mention of std::valarray. Unless someone can point out why &(a[0]) can't be used to obtain the valarray's data pointer, the simple answer is that valarray was either forgotten or ignored in the proposal.

Emile Cormier
  • 28,391
  • 15
  • 94
  • 122