1

I am working in an antiquated code base that used unsigned char*s to contain strings. For my functionality I've used strings however there is a rub:

I can't use anything in #include <cstring> in the old code. Copying from a string to an unsigned char* is a laborious process of:

unsigned char foo[12];
string bar{"Lorem Ipsum"};

transform(bar.cbegin(), bar.cbegin() + min(sizeof(foo) / sizeof(foo[0]), bar.size()), foo, [](auto i){return static_cast<unsigned char>(i);});
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';

Am I going to get into undefined behavior or aliasing problems if I just do:

strncpy(reinterpret_cast<char*>(foo), bar.c_str(), sizeof(foo) / sizeof(foo[0]) - 1);
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • Do you need to be cross-platform here? Are you willing to settle for something that works on your current platform, even if it may strictly be UB in the C++ Standard? `char` is guaranteed to have the same representation as one of `signed char` or `unsigned char`, and even in the `signed` case, you are probably safe with strictly ACSII data: `[0,127]`. – BoBTFish Jan 19 '16 at 14:00
  • @BeyelerStudios It is ASCII. – Jonathan Mee Jan 19 '16 at 14:01
  • Shouldn't that be `static_cast or just a c style cast? – J.J. Hakala Jan 19 '16 at 14:02
  • @BoBTFish I'm unwilling to use behavior that is undefined according to the standard. That's the reason I'm asking the question. Obviously I can use the `transform` it's just not clear what I'm doing. – Jonathan Mee Jan 19 '16 at 14:03
  • @J.J.Hakala No. It should be a `reinterpret_cast`. If you try a `static_cast` the compiler will, rightly, give you the error: "`static_cast`: cannot convert from `unsigned char [12]` to `char *`" – Jonathan Mee Jan 19 '16 at 14:06
  • 1
    @BeyelerStudios [`strncpy`](http://en.cppreference.com/w/cpp/string/byte/strncpy) already checks that for me. – Jonathan Mee Jan 19 '16 at 14:19
  • 1
    A note on wording: you **cannot** store strings in `char*`s of any flavor. You store strings in `char` **arrays**, and you access the contents of those arrays through pointers, i.e., `char*`s. Muddling "array" and "pointer" leads to no end of confusion. – Pete Becker Jan 19 '16 at 14:26
  • 1
    @PeteBecker, this is too much of the hair splitting for my taste. It is also not correct - I can certainly store strings in `char*` which never was an array – SergeyA Jan 19 '16 at 14:28
  • @SergeyA No, you can make a `char*` point to a string literal, which is of type `const char[nchars + 1]`. – TartanLlama Jan 19 '16 at 14:30
  • @TartanLlama, but I can also point it to something else ;) – SergeyA Jan 19 '16 at 14:32
  • @SergeyA Sure, but you'll be pointing it to some other array of characters or a single character. – TartanLlama Jan 19 '16 at 14:33
  • @TartanLlama, but why? `int x[255]; char* s = x; s[0] = 'A';` is perfectly legal. Also, `struct Blob { double x, y, z; } blob; char* s = &blob; s[3] = 'C' ` is legal as well. – SergeyA Jan 19 '16 at 14:36
  • @SergeyA The comment you disagreed with was talking about strings. Your first example is modifying the value representation of an `int`, your second example is undefined behaviour. – TartanLlama Jan 19 '16 at 14:38
  • @TartanLlama, not sure if I follow. In the first example I tried to show how I can store strings in something which was never a char array (but it was an int array). In the next I've shown how to store strings into something which never was an array. Why do you think it is an undefined behaviour? – SergeyA Jan 19 '16 at 14:40
  • @TartanLlama, can't go to the chat from my office computer :) – SergeyA Jan 19 '16 at 14:43
  • @SergeyA Maybe we can work this out then clear up our comments. I guess some of this comes down to what you define as "string", but I'd say your first example is doing operations on the object representation rather than doing anything with strings. I think your second example is undefined behaviour as pointer arithmetic is only defined within an array. – TartanLlama Jan 19 '16 at 14:46
  • @TartanLlama, interesting indeed, but I am too little of the language lawer to answer this. I think, I will ask a question. – SergeyA Jan 19 '16 at 14:48
  • Anyone looking for the new @SergeyA question can find it here: http://stackoverflow.com/q/34879858/2642059 – Jonathan Mee Jan 19 '16 at 15:27

1 Answers1

2

There is an explicit exception to the strict aliasing rule for [unsigned] char, so casting pointers between character types will just work.

Specifically in N3690 [basic.types] says that any trivially copyable object can be copied into an array of char or unsigned char, and if then copied back the value is identical. It also says if you copy the same array into a second object, the two objects are identical. (Paragraphs two and three)

[basic.lval] says it is legal to change an object through an lvalue of char or unsigned char type.

The concern expressed by BobTFish in the comments about whether values in char and unsigned char is misplaced I think. "Character" values are inherently of char type. You can store them in unsigned char and use them as char later - but that was happening already.

(I'd recommend writing a few in-line wrapper functions to make the whole thing less noisy, but I assume the code snippets were for exposition rather than actual usage.)

Edit: Remove erroneous recommendation to use static_cast.

Edit2: Chapter and verse.

  • 1
    You cannot do `static_cast` as noted here: http://stackoverflow.com/questions/34878710/is-it-legal-to-cast-away-the-sign-on-a-pointer#comment57495575_34878710 – Jonathan Mee Jan 19 '16 at 14:07
  • 1
    Call me a skeptic, but I'd sure like to see a source on this especially after the whole `static_cast` debacle. – Jonathan Mee Jan 19 '16 at 14:09
  • 1
    Yup. Saw your comment above and changed my answer (before your comment here appeared!) – Martin Bonner supports Monica Jan 19 '16 at 14:09
  • 1
    Funny. I was just thinking "rather casts my assertion on strict aliasing into doubt". – Martin Bonner supports Monica Jan 19 '16 at 14:09
  • 1
    That's for accessing the object representation, isn't it? I.e. you can cast a pointer to any object to `(unsigned) char*` and back, and copy it if the object is *trivially copyable*. But I don't see any guarantees about the actual character values being the same. (Unless `char` is `unsigned`, which is *implementation defined*). – BoBTFish Jan 19 '16 at 14:10
  • You cite that this is legal for trivially copyable types. But is an array trivially copyable? I thought that was specifically not the case. – Jonathan Mee Jan 19 '16 at 14:35
  • Even if the array is not trivially copyable, each of the elemental chars or unsigned chars is. – Martin Bonner supports Monica Jan 20 '16 at 13:14
  • You seem to be saying that this error from GCC is a bug: error: invalid static_cast from type ‘char*’ to type ‘uint8_t* {aka unsigned char*}’ but that's the error I'm getting. I have to reinterpret_cast to suppress it. Making the type explicit doesn't help: error: invalid static_cast from type ‘char*’ to type ‘unsigned char*’. This does work for the underlying char vs unsigned char cast, but not for the pointers. IME, anyway. – Code Abominator Aug 21 '17 at 23:35
  • @CodeAbominator : No, the warning is correct. You need to use reinterpret cast. What I am saying is that once you have done the reinterpret cast, using the pointer is not undefined behaviour (it doesn't violate the strict aliasing rules). – Martin Bonner supports Monica Aug 22 '17 at 05:07