4

I have variable input type const std::string&:

const std::string& input

Now I need to convert this to const unsigned char* because this is the input of the function.

Unitl now I have correct code for converting:

reinterpret_cast<const unsigned char*>(input.c_str()) 

This works well, but in clang I got a warning:

do not use reinterpret_cast [cppcoreguidelines-pro-type-reinterpret-cast]

What is the correct way to change a string or const char* to const unsigned char*?

Nejc Galof
  • 2,538
  • 3
  • 31
  • 70
  • 2
    The correct way is using reinterpret_cast – user253751 Sep 08 '21 at 14:04
  • 1
    I'd be curious to see the function. I find it odd that a function expecting a C-string would want unsigned chars. Seems more like a generic buffer. – sweenish Sep 08 '21 at 14:05
  • 1
    You can do 2 static casts - one to the `const void*`, second from `const void*` to `const unsigned char*` It is also possible that a better container for you would be a vector of `unsigned char`, rather than `std::string`. – SergeyA Sep 08 '21 at 14:08

2 Answers2

5

What is the correct way to change a string or const char* to const unsigned char*?

The correct way is to use reinterpret_cast.

If you want to avoid reinterpret_cast, then you must avoid the pointer conversion entirely, which is only possible by solving the XY-problem. Some options:

  • You could use std::basic_string<unsigned char> in the first place.
  • If you only need an iterator to unsigned char and not necessarily a pointer, then you could use std::ranges::views::transform which uses static cast for each element.
  • You could change the function that expects unsigned char* to accept char* instead.

If you cannot change the type of input and do need a unsigned char* and you still must avoid reinterpret cast, then you could create the std::basic_string<unsigned char> from the input using the transform view. But this has potential overhead, so consider whether avoiding reinterpret_cast is worth it.

eerorika
  • 232,697
  • 12
  • 197
  • 326
0

Edit
Apparently type punning with an union is UB so definitely don't do this.
(Keeping the answer for posterity though!)


To strictly answer your question, there's this way:

void foo(const unsigned char* str) {
    std::cout << str << std::endl;
}

int main()
{
    std::string word = "test";
    //foo(word.data()); fails
    union { const char* ccptr; const unsigned char* cucptr; } uword;
    uword.ccptr = word.data();
    foo(uword.cucptr);
}

Is this any better than a reinterpret_cast? Probably not.

m88
  • 1,968
  • 6
  • 14
  • 1
    the better question is: Is this worse than `reinterpret_cast`? Definitely yes. Type punning via unions is allowed with some compilers as an extension but it is not standard portable c++ – 463035818_is_not_an_ai Sep 08 '21 at 14:19
  • Not sure if C++20 changes anything in this regard, and I don't think it does, but before C++20, this is 100% undefined behavior. – NathanOliver Sep 08 '21 at 14:21
  • @NathanOliver AFAIK it doesn't, type punning via `union` is still and will always be UB – Mgetz Sep 08 '21 at 14:21
  • Can you please link to that assertion? I'd love to use an official source to "explain" this to some people I know as to why this is bad... and they should stop doing it, like immediately. *shudder* – Kevin Anderson Sep 08 '21 at 14:22
  • @KevinAnderson [it's UB to read from the inactive member of a union](https://en.cppreference.com/w/cpp/language/union) that said many compilers support it. But that doesn't mean it's not technically UB. – Mgetz Sep 08 '21 at 14:23
  • I knew it was dirty (I would never use that in actual code...) but I didn't know it was UB. Any code example on how this can lead to UB? – m88 Sep 08 '21 at 14:23
  • @m88 just type pun to any class with a destructor ;) – Mgetz Sep 08 '21 at 14:23
  • @Mgetz I don't have anything with a destructor in my union... – m88 Sep 08 '21 at 14:25
  • @Mgetz that's pretty close (I use that as an authority), but I mean like an ISO document that says it. I have a mountain to chip away at, so I need the Netherite pickaxe, not just the iron one. – Kevin Anderson Sep 08 '21 at 14:25
  • @KevinAnderson and m88 Give this a read: https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior – NathanOliver Sep 08 '21 at 14:26
  • @m88 `Any code example on how this can lead to UB?` Your example. The thing about UB is that it can behave exactly as you want it to behave. It's just not guaranteed to and for any reason might not behave as you want. – eerorika Sep 08 '21 at 14:27
  • 1
    More than likely this will never fail, as all it is doing is a `reinterpret_cast` the long way around. The problem is that it teaches to use a technique that is UB, and if used on more complex types, can lead to very hard to find bugs. – NathanOliver Sep 08 '21 at 14:28
  • 1
    @KevinAnderson [This should do](https://eel.is/c++draft/class.union#general-2) it's pretty explicit that only one active member can exist and you can't switch – Mgetz Sep 08 '21 at 14:28
  • Alright, good to know. Does this mean reading a `float` as a `char[4]` with the same technique is also UB? I'm pretty sure I've seen this trick many times before. – m88 Sep 08 '21 at 14:30
  • 1
    @m88 yes there are also [methods in the standard to do that now](https://en.cppreference.com/w/cpp/numeric/bit_cast) without UB – Mgetz Sep 08 '21 at 14:31
  • 1
    That type punning with `union` works in most compilers is mostly because the major compiles support both c++ and c11 and due to that likely use the behavior descript in the specification of C11 for the union part. But that does not change anything about it being UB. – t.niese Sep 08 '21 at 14:32
  • 1
    @m88 Yes, that is also UB in C++, though again it is one of those cases that most likely will never fail. Before C++ 20, the standard way to type pun was to use `memcpy`. Now that we have C++20, we can use `std::bit_cast` instead. – NathanOliver Sep 08 '21 at 14:33
  • @t.niese doing a bit of research... it's more complicated than that, C11 allows the read from the inactive member. It doesn't define if that read causes undefined behavior itself. Which is a weird but valid way to approach it. C++ definitely still sticks with active member if you see the link above. Probably because C++ has lifetimes and that would require lifetime transfer. So at best it's dubiously supported for trivial types that are standard layout. But anything with a non-trivial destructor is definitely UB. – Mgetz Sep 08 '21 at 14:44
  • 1
    `I'm pretty sure I've seen this trick many times before` it is not uncommon to see things in code that is UB. For some things, it is/was commonly accepted that the UB being utilized is just a UB due to a defect in the specification (like `P0593R6 Implicit creation of objects for low-level object manipulation`), and for certain things you could assume that all major vendors agreed to the same behavior for a certain UB case (but there is no guarantee for that). You generally don't want to rely on something that is UB. – t.niese Sep 08 '21 at 14:44