6

Here's an interesting one. I'm writing an AES encryption algorithm, and have managed to get it making accurate encryptions. The trouble comes when I attempt to write the result to a file. I was getting files with incorrect output. Hex values would be mangled and it was just generally nonsensical (even by encrypted standards).

I did some debugging by sampling my encryption output before sending it to the file. What I found was that I was getting some type of overflow somewhere. When the correct hex value was supposed to be 9e, I would get ffffff9e. It would do this only to hex values above 7F, i.e. characters in the "extended" character set weren't being handled properly. This had happened to me earlier in my project as well, and the problem then had been using a char[][] container instead of an unsigned char[][] container.

My code uses strings to pass the encrypted data between the user interface and AES encryption class. I'm guessing that std::strings don't support the extended character set. So my question is: is there a way to instantiate an unsigned string, or will I have to find a way to replace all of my usage of strings?

Connor Spangler
  • 805
  • 2
  • 12
  • 29
  • This happens often on MS-Windows. They `char` is considered signed (even in mingwin and other Unix development tools ported to MS-Windows.) Under Linux, it is similar to an `unsigned char` (but still a separate type.) – Alexis Wilke May 11 '19 at 21:37

3 Answers3

18

std::string is really just a typedef, something like:

namespace std { 
   typedef basic_string<char> string;
}

It's fairly easy to create a variant for unsigned char:

typedef basic_string<unsigned char> ustring;

You will, however, have to change your code to use a ustring (or whatever name you prefer) instead of std::string though.

Depending on how you've written your code, that may not require editing all the code though. In particular, if you have something like:

namespace crypto { 
   using std::string;

   class AES { 
      string data;
      // ..
    };
}

You can change the string type by changing only the using declaration:

namespace unsigned_types { 
    typedef std::basic_string<unsigned char> string;
}

// ...

namespace crypto {
    using unsigned_types::string;

    class AES {
        string data;
    };
}

Also note that different instantiations of a template are entirely separate types, even when the types over which they're intantiated are related, so the fact that you can convert implicitly between char and unsigned char doesn't mean you'll get a matching implicit conversion between basic_string<char> and basic_string<unsigned char>.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
3

std::string is nothing more or less than a specialization of the std::basic_string<> template, so you can simply do a

typedef std::basic_string<unsigned char> ustring;

to get what you want.


Note that the C/C++ standards do not define whether char is the signed or the unsigned variety, so any program that casts a char directly to a larger type invokes implementation defined behaviour.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • 1
    How could casting from an [unsigned] char to a larger signed type give UB? A larger type needs to be at least one bit larger, in which case any value of unsigned char can be represented in the larger type without modification (in which case, that's exactly what happens). – Jerry Coffin Nov 30 '13 at 01:37
  • @JerryCoffin I'm sorry, I messed up UB and implementation defined behaviour here. According to the standard, casts to a signed integer type that cannot represent the casted value are implementation defined (and I thought, it was UB). I fixed the answer accordingly. – cmaster - reinstate monica Nov 30 '13 at 08:27
  • The point isn't about UB vs. IB. It's about the fact that any value that can be represented in an unsigned of one size can always be represented in a signed of a larger size. – Jerry Coffin Nov 30 '13 at 15:12
  • @JerryCoffin Ah, now I see your point. But I guess, it has already been corrected with my last edit, no? In any case, if I cast a `char` of value `0x80` to an `uint_16`, I may either get `0x0080` or `0xff80`, depending on the compiler/platform. – cmaster - reinstate monica Nov 30 '13 at 17:08
  • Yes, I'd agree with what you have now. – Jerry Coffin Nov 30 '13 at 17:12
2

Cast your value to unsigned char first:

char input = 250;                                    // just an example

unsigned int n = static_cast<unsigned char>(input);  // NOT: "unsigned int n = input;"
//               ^^^^^^^^^^^^^^^^^^^^^^^^^^

The problem is that your char happens to be signed, and so its value is not the "byte value" that you want -- you have to convert to unsigned char to get that.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084