0

My payload is stored in a std::string xyz (holds binary data), and I need to pass it to a function that takes it as const unsigned int*. How would I convert from std::string to const unsigned int*?

I tried reinterpret_cast<const unsigned int*>(&xyz.front()) but it is not working!

The function prototype is as follows:

void roll(void *pdst, const unsigned int *psrc);

pdst will hold the results.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Xigma
  • 177
  • 1
  • 11
  • `reinterpret_cast(&xyz[0])`.. but you will have to make sure the string outlives the function call AND pass in the length or know the length of bytes before hand.. but also, why are you storing binary data in a string? – Brandon May 19 '20 at 15:40
  • I know, it is a bad decision, and too late to change it now. – Xigma May 19 '20 at 15:49
  • @Brandon `xyz[0]` and `xyz.front()` return the same thing - a *reference* to the first char. – Remy Lebeau May 19 '20 at 17:09
  • @Xigma please show how the `std::string` is being populated with binary data. `reinterpret_cast(&xyz.front())` will work just fine (you don't need the `const`) provided the `std::string`'s `size()` is large enough to hold `sizeof(unsigned int)` number of `char`s. – Remy Lebeau May 19 '20 at 17:10
  • @RemyLebeau now worries, I got it working as per Christian's recommendations. – Xigma May 21 '20 at 02:51

2 Answers2

1

Don't use std::string to store binary data; that class is specifically designed for working with strings. It feels like there was original C code that was using a char array to store a sequence of bytes and translated that to std::string for C++. In this case, it's not being used as a string, so it doesn't make sense to store it in a std::string.

From there, translating to an unsigned int, well for starters, you can't simply cast it even if you were using a more primitive type such as a char *, as it would violate the rules of strict aliasing resulting in undefined behavior. What you want to do is create a new variable and memcpy the data into this new variable.

Here is the section from the C++14 standard working draft describing compatible types (3.10 p10):

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: 54

— the dynamic type of the object,

— a cv-qualified version of the dynamic type of the object,

— a type similar (as defined in 4.4) to the dynamic type of the object,

— a type that is the signed or unsigned type corresponding to the dynamic type of the object,

— a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

— an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

— a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

— a char or unsigned char type.

As you can see, it explicitly allows for accessing any object as a char or unsigned char, but it gives no such allowance to access a char or unsigned char as anything else.

Christian Gibbons
  • 4,272
  • 1
  • 16
  • 29
  • Agreed. Thank you for the guidance. – Xigma May 19 '20 at 15:56
  • 1
    @Brandon He's talking about casting to `unsigned int*`, not `char*`. – Nelfeal May 19 '20 at 16:48
  • 1
    @Brandon Aliasing to character types is a special case that is well-defined. This is going the other way, however. – Christian Gibbons May 19 '20 at 16:59
  • Aliasing is symmetric, so "going the other way" is equally well-defined – Chris Dodd May 19 '20 at 17:13
  • @ChrisDodd Wrong. I'll give you an easy example given the assumption that an `int` is 4-bytes: you've got a `char` array that that starts at memory address 0x00000001. Now you try alias it as an `int *`. Problem: This is does not meet the minimum alignment requirements of an `int`. – Christian Gibbons May 19 '20 at 17:39
  • But that's not an aliasing problem and has nothing to do with strict aliasing. – Chris Dodd May 19 '20 at 17:41
  • I have updated with the section of the C++ standard that describes compatible types so we can put this matter to rest. – Christian Gibbons May 19 '20 at 17:54
  • Would suck to see any of you guys using `malloc` without `placement new` in the example: `int* mem = malloc(sizeof(int) * 10)`.. – Brandon May 19 '20 at 18:09
  • @ChrisDodd It is an aliasing problem. Why would you say it's not? One result of UB can be an unaligned access and this is UB because it violates strict aliasing rules. – David Schwartz May 19 '20 at 18:52
  • Because its an alignment problem -- aliasing and alignment are two completely different things either of which can lead to problems. Conflating the two is not useful. – Chris Dodd May 19 '20 at 19:11
  • https://stackoverflow.com/questions/17789928/whats-a-proper-way-of-type-punning-a-float-to-an-int-and-vice-versa has some comments stating why `memcpy` for type-punning/aliasing is also undefined behaviour.. and Linus Torvald has a few words on why to ignore the standard: https://lkml.org/lkml/2018/6/6/88 There is literally no `defined` way to type-pun in C++. `memcpy` in this case invokes implementation defined behaviour. https://stackoverflow.com/a/38610554/1462718 See the footnote on `no defined type` for allocations.. if OP used `m/re/c/alloc`, it is undefined behaviour. – Brandon May 25 '20 at 17:23
  • @Brandon The fact that `memcpy` works for variables with declared types *does* mean that there is a defined way to type-pun in C++. If you need to store that value to an object allocated with `malloc`, you can use an intermediary with a declared type. – Christian Gibbons May 25 '20 at 20:00
  • Just because it works on your implementation (GCC/Clang) doesn't mean it works on all either. C++ itself has no defined way to type-pun or alias that is accepted across all implementations or in the standard. Interpreting data with an intermediate type would break strict-aliasing. memcpy isn't magic. It's a workaround that can introduce undefined behaviour and be implementation defined. – Brandon May 25 '20 at 21:10
  • @Brandon I'm not talking about "my implementation". I'm talking about perfectly follows the standard. The issue `memcpy` has with `malloc`'d data is that the data has no declared type. With such data, when you store a value to it, it then has an "effective type" that it will use for subsequent reads until it is written to again and receives a new effective type. If instead, you `memcpy` to a variable with a declared type, it is perfectly within the specs. You can then assign that value to your `malloc`d data which will take on the effective type equal to that of the declared variable. – Christian Gibbons May 25 '20 at 21:10
  • `memcpy` has no idea about your effective type in the first place. It literally accepts two void pointers. Using `memcpy` to get around the wording in the standard makes no sense to me. `memcpy(&int, &float, sizeof(int))` and then using that `int` is not a work around for casting a `float` to an `int`. There's no objects of `unsigned int` stored in OP's string.. For this reason, even the standard way `std::launder` would be undefined behaviour. If you're not using your compiler specific extension for this, you are `hoping` your code works. – Brandon May 25 '20 at 21:30
  • @Brandon `memcpy` is a special function. While you won't find much of a description of it in the C++ standard, it does say that the contents and meaning of is identical to the C Standard Library's . So I refer you to section 6.5 p6 of the C11 standard which describes the effective types of objects. It specifically details how it pertains to the standard functions `memcpy` and `memmove` and how it works with objects with no declared types. If you have a section of the C++ standard that invalidates it, I'd like to read it. – Christian Gibbons May 25 '20 at 22:18
0

The problem is how you store binary data to std string? If you are simply using the constructor, you could get your binary data by xyz.data().

  • There is no difference between using `xyz.data()` and `&xyz.front()`, they will both return the same memory address. – Remy Lebeau May 19 '20 at 17:13