How to turn a hex string into an unsigned char array?

Question

For example, I have a cstring "E8 48 D8 FF FF 8B 0D" (including spaces) which needs to be converted into the equivalent unsigned char array {0xE8,0x48,0xD8,0xFF,0xFF,0x8B,0x0D}. What's an efficient way to do this? Thanks!

EDIT: I can't use the std library... so consider this a C question. I'm sorry!

James McNellis · Answer 1 · 2010-07-10T23:16:06.573

This answers the original question, which asked for a C++ solution.

You can use an istringstream with the hex manipulator:

std::string hex_chars("E8 48 D8 FF FF 8B 0D");

std::istringstream hex_chars_stream(hex_chars);
std::vector<unsigned char> bytes;

unsigned int c;
while (hex_chars_stream >> std::hex >> c)
{
    bytes.push_back(c);
}

Note that c must be an int (or long, or some other integer type), not a char; if it is a char (or unsigned char), the wrong >> overload will be called and individual characters will be extracted from the string, not hexadecimal integer strings.

Additional error checking to ensure that the extracted value fits within a char would be a good idea.

Because I cannot give two correct answers, I went ahead and upvoted this one, as this definitely is a great solution for C++ users! — Gbps, Jul 11 '10 at 00:50

score 14 · Accepted Answer · answered Jul 11 '10 at 00:23

14

You'll never convince me that this operation is a performance bottleneck. The efficient way is to make good use of your time by using the standard C library:

static unsigned char gethex(const char *s, char **endptr) {
  assert(s);
  while (isspace(*s)) s++;
  assert(*s);
  return strtoul(s, endptr, 16);
}

unsigned char *convert(const char *s, int *length) {
  unsigned char *answer = malloc((strlen(s) + 1) / 3);
  unsigned char *p;
  for (p = answer; *s; p++)
    *p = gethex(s, (char **)&s);
  *length = p - answer;
  return answer;
}

Compiled and tested. Works on your example.

answered Jul 11 '10 at 00:23

Norman Ramsey

198,648
61
360
533

I chose this as the answer because it simply provided a working example. Thanks! – Gbps Jul 11 '10 at 00:51
3

OTOH, buffer overflow on "A B C D E F 1 2 3 4 5 6 7 8 9". – Ben Voigt Jul 11 '10 at 01:08
5

Much simpler: `for (i=0; i – R.. GitHub STOP HELPING ICE Jul 11 '10 at 04:54
@R: great point about strtoul---I didn't read the man page carefully enough. Feel free to edit. – Norman Ramsey Jul 11 '10 at 05:46
This cant work properly only if spaces are present in every two digits. IMO this makes this approach crappy. – Marek R Nov 25 '16 at 14:51

score 8 · Answer 3 · edited Dec 09 '11 at 12:41

8

Iterate through all the characters.
- If you have a hex digit, the number is (ch >= 'A')? (ch - 'A' + 10): (ch - '0').
  - Left shift your accumulator by four bits and add (or OR) in the new digit.
- If you have a space, and the previous character was not a space, then append your current accumulator value to the array and reset the accumulator back to zero.

edited Dec 09 '11 at 12:41

Mark

6,269
2
35
34

answered Jul 10 '10 at 23:20

Ben Voigt

277,958
43
419
720

+1: This is probably the most straightforward and simple way to do it. – James McNellis Jul 10 '10 at 23:22
That's basically what I did, except for using switch instead of ternary test. Depending on compiler and processor architecture one or the other may be faster. But you should also test every character is in range 0-9A-F, and it makes testing the same thing two times. – kriss Jul 10 '10 at 23:42
1

@kriss: It's all in the assumptions. You assume that there must be exactly two hex digits and one space between each value, mine allows omission of a leading zero or multiple spaces, but assumes that there are no other classes of characters in the string. If you can't assume that, I'd probably choose to do validation separately, by testing `if (s[strspn(s, " 0123456789ABCDEF")]) /* error */;` Sure, it's another pass on the string, but so much cleaner. Or avoid the second pass over the string by using `isspace` and `isxdigit` on each character, which uses a lookup table for speed. – Ben Voigt Jul 11 '10 at 00:19
Looping around switches is not really an issue, I do not really take it as a difference. I choosed to assume there was exactly two hex char in input, because if you allow more than that you should also check range for values. And what about allowing negativer numbers, we would have to manage sign, etc. switch *is* a kind of lookup table... (and another fast conversion method would be to really use one implemented as an array). – kriss Jul 11 '10 at 00:40
The problem specified that all inputs were unsigned. The problem didn't specify that there would always be zeros padding to exactly two digits (e.g. all of these fit in a `char`: `0xA`, `0x0A`, `0x000A`) or just one space, although these assumptions were true on the sample input. – Ben Voigt Jul 11 '10 at 01:23
You should use isxdigit first. Or see R's comment above. – Mark Dec 09 '11 at 12:42

score 5 · Answer 4 · answered Feb 26 '16 at 20:33

use the "old" sscanf() function:

string s_hex = "E8 48 D8 FF FF 8B 0D"; // source string
char *a_Char = new char( s_hex.length()/3 +1 ); // output char array

for( unsigned i = 0, uchr ; i < s_hex.length() ; i += 3 ) {
    sscanf( s_hex.c_str()+ i, "%2x", &uchr ); // conversion
    a_Char[i/3] = uchr; // save as char
  }
delete a_Char;

score 5 · Answer 5 · answered Nov 21 '11 at 17:24

If you know the length of the string to be parsed beforehand (e.g. you are reading something from /proc) you can use sscanf with the 'hh' type modifier, which specifies that the next conversion is one of diouxX and the pointer to store it will be either signed char or unsigned char.

// example: ipv6 address as seen in /proc/net/if_inet6:
char myString[] = "fe80000000000000020c29fffe01bafb";
unsigned char addressBytes[16];
sscanf(myString, "%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx
%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx", &addressBytes[0],
&addressBytes[1], &addressBytes[2], &addressBytes[3], &addressBytes[4], 
&addressBytes[5], &addressBytes[6], &addressBytes[7], &addressBytes[8], 
&addressBytes[9], &addressBytes[10], addressBytes[11],&addressBytes[12],
&addressBytes[13], &addressBytes[14], &addressBytes[15]);

int i;
for (i = 0; i < 16; i++){
    printf("addressBytes[%d] = %02x\n", i, addressBytes[i]);
}

Output:

addressBytes[0] = fe
addressBytes[1] = 80
addressBytes[2] = 00
addressBytes[3] = 00
addressBytes[4] = 00
addressBytes[5] = 00
addressBytes[6] = 00
addressBytes[7] = 00
addressBytes[8] = 02
addressBytes[9] = 0c
addressBytes[10] = 29
addressBytes[11] = ff
addressBytes[12] = fe
addressBytes[13] = 01
addressBytes[14] = ba
addressBytes[15] = fb

score 0 · Answer 6 · answered Jul 11 '10 at 00:15

0

For a pure C implementation I think you can persuade sscanf(3) to do what you what. I believe this should be portable (including the slightly dodgy type coercion to appease the compiler) so long as your input string is only ever going to contain two-character hex values.

#include <stdio.h>
#include <stdlib.h>


char hex[] = "E8 48 D8 FF FF 8B 0D";
char *p;
int cnt = (strlen(hex) + 1) / 3; // Whether or not there's a trailing space
unsigned char *result = (unsigned char *)malloc(cnt), *r;
unsigned char c;

for (p = hex, r = result; *p; p += 3) {
    if (sscanf(p, "%02X", (unsigned int *)&c) != 1) {
        break; // Didn't parse as expected
    }
    *r++ = c;
}

answered Jul 11 '10 at 00:15

bjg

7,457
1
25
21

Declare `c` as `unsigned int`, otherwise you could overwrite other local variables (or worse yet, your return address). – Ben Voigt Jul 11 '10 at 00:26
But generally scanf is going to take longer to figure out the format code than my entire answer will, and the question did ask for an *efficient* way. – Ben Voigt Jul 11 '10 at 00:28
@Ben Voigt. Yes but does efficient mean run-time or programmer-time? '-) Anyway thanks for pointing out that I should have made `c` an `insigned int` and coerced that into the `result` array. – bjg Jul 11 '10 at 01:09
1

UB. Since at expected end `p` points one byte AFTER terminating zero. – Marek R Nov 25 '16 at 14:47
@MarekR Good catch. I was clearly in two minds writing this (6 years ago), having declared a `cnt` variable and then having not used it – bjg Nov 27 '16 at 00:18

kriss · Answer 7 · 2010-07-11T02:13:50.027

The old C way, do it by hand ;-) (there is many shorter ways, but I'm not golfing, I'm going for run-time).

enum { NBBYTES = 7 };
char res[NBBYTES+1];
const char * c = "E8 48 D8 FF FF 8B 0D";
const char * p = c;
int i = 0;

for (i = 0; i < NBBYTES; i++){
    switch (*p){
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
      res[i] = *p - '0';
    break;
    case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
      res[i] = *p - 'A' + 10;
    break;
   default:
     // parse error, throw exception
     ;
   }
   p++;
   switch (*p){
   case '0': case '1': case '2': case '3': case '4':
   case '5': case '6': case '7': case '8': case '9':
      res[i] = res[i]*16 + *p - '0';
   break;
   case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
      res[i] = res[i]*16 + *p - 'A' + 10;
   break;
   default:
      // parse error, throw exception
      ;
   }
   p++;
   if (*p == 0) { continue; }
   if (*p == ' ') { p++; continue; }
   // parse error, throw exception
}

// let's show the result, C style IO, just cout if you want C++
for (i = 0 ; i < 7; i++){
   printf("%2.2x ", 0xFF & res[i]);
}
printf("\n");

Now another one that allow for any number of digit between numbers, any number of spaces to separate them, including leading or trailing spaces (Ben's specs):

#include <stdio.h>
#include <stdlib.h>

int main(){
    enum { NBBYTES = 7 };
    char res[NBBYTES];
    const char * c = "E8 48 D8 FF FF 8B 0D";
    const char * p = c;
    int i = -1;

    res[i] = 0;
    char ch = ' ';
    while (ch && i < NBBYTES){
       switch (ch){
       case '0': case '1': case '2': case '3': case '4':
       case '5': case '6': case '7': case '8': case '9':
          ch -= '0' + 10 - 'A';
       case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
          ch -= 'A' - 10;
          res[i] = res[i]*16 + ch;
          break;
       case ' ':
         if (*p != ' ') {
             if (i == NBBYTES-1){
                 printf("parse error, throw exception\n");
                 exit(-1);
            }
            res[++i] = 0;
         }
         break;
       case 0:
         break;
       default:
         printf("parse error, throw exception\n");
         exit(-1);
       }
       ch = *(p++);
    }
    if (i != NBBYTES-1){
        printf("parse error, throw exception\n");
        exit(-1);
    }

   for (i = 0 ; i < 7; i++){
      printf("%2.2x ", 0xFF & res[i]);
   }
   printf("\n");
}

No, it's not really obfuscated... but well, it looks like it is.

Are we allowed to say 'Ick!'? (If only because the code will 'throw exception' on the last loop, because there are only 6 spaces in the string, not 7 as the code requires.) — Jonathan Leffler, Jul 10 '10 at 23:43
@Jonathan: not any more... I could also have added a space to input. The old separators vs terminators debate. — kriss, Jul 11 '10 at 00:43
your little fix doesn't help... `*p != ' '` on the terminating NUL and it doesn't matter what you logical-or that with. — Ben Voigt, Jul 11 '10 at 01:05
Opps, I did err again. You should like the new fix better :-) — kriss, Jul 11 '10 at 01:15

How to turn a hex string into an unsigned char array?

7 Answers7

Linked

Related