2

I need to convert hex encoded string like this:

char hstr[9] = "61626364"; // characters abcd\0

Into

"abcd" // characters as hex: 0x61 0x62 0x63 0x64
       // hex "digits" a-f are always lowercase

At this moment I wrote this function:

#include <stdlib.h>

void htostr(char* hexstr, char* str) {
    int len = strlen(hexstr);

    for (int i = 0; i < len/2; i++) // edit: fixed bounds
    {
        char input[3] = { hexstr[2 * i], hexstr[2 * i + 1], 0 };
        *(str + i) = (char)strtol(input, NULL, 16);
    }
}

I'm using strtol function to do the job.

I feel I'm wasting 3 bytes of memory for input array and some processor time for copying two bytes and terminating with 0, because strtol function has no parameter like "length".

The code is supposed to run on a pretty busy microcontroller, the strings are quite long (it would be a good idea to free up the memory used by hexstr as soon as possible).

The question is: is there more efficient way to do this without writing my own converter from scratch?

By "from scratch" I mean low level conversion without using functions standard library.

Kamil
  • 13,363
  • 24
  • 88
  • 183
  • Sounds like premature optimization. Unless this piece of code is a measured bottleneck, it's probably not worth worrying about. – dbush Mar 21 '22 at 22:24
  • Are you actually using the integer result for anything, or is it just an intermediary before turning it back into a string of the with the ascii characters? Wondering if a look-up table might be worth using and skipping `strtol` altogether. – Christian Gibbons Mar 21 '22 at 22:24
  • @ChristianGibbons I don't need integers here. Everything is text (actually there are JSON messages). And I think that look-up table with 16 hex digits would be great. Take first char, shift 4 bits left, take second char, add to first and we have character. I think this is it. – Kamil Mar 21 '22 at 22:37

5 Answers5

1

When you are allowed to temporary change the input string:

void htostr_1(char* hexstr, char* str) {
    int len = strlen(hexstr);

    for (int i = 0; 2 * i + 2 <= len; i++)
    {
        char tmp = hexstr[2 * i + 2];
        hexstr[2 * i + 2] = 0;
        str[i] = (char)strtol(hexstr + 2 * i, NULL, 16);
        hexstr[2 * i + 2] = tmp;
    }
}

Saves the next byte before terminating the string there to undo it after the strtol: https://godbolt.org/z/zdMdKrY7n

As a side note: The end condition of the for loop is wrong, you access out of bounds: https://godbolt.org/z/ra87cWocY

If you want to save also the int len and the unnecessary strlen call:

void htostr_2(char* hexstr, char* str) {
    while (*hexstr)
    {
        char tmp = hexstr[2];
        hexstr[2] = 0;
        *str++ = (char)strtol(hexstr, NULL, 16);
        hexstr[2] = tmp;
        hexstr += 2;
    }
}
mch
  • 9,424
  • 2
  • 28
  • 42
  • Interesting approach. Regarding bounds - I got this, I forgot to change loop condition after I changed "i" meaning. – Kamil Mar 21 '22 at 22:31
1

Instead of copying two characters and using strtol you could create a function that converts the characters 0 .. 9 and A .. F to an int (0x0 to 0xF).

#include <ctype.h>

int toval(char ch) {
    if (isdigit((unsigned char)ch)) return ch - '0';
    return toupper((unsigned char)ch) - 'A' + 0x10;
}

Then looping over the string and adding up the result will be pretty straight forward:

void htostr(char *wr, const char *rd) {
    for (; rd[0] != '\0' && rd[1] != '\0'; rd += 2, ++wr) {
        // multiply the first with 0x10 and add the value of the second
        *wr = toval(rd[0]) * 0x10 + toval(rd[1]);
    }
    *wr = '\0'; // null terminate
}

Example usage:

#include <stdio.h>

int main() {
    char hstr[] = "61626364";
    char res[1 + sizeof hstr / 2];

    htostr(res, hstr);

    printf(">%s<\n", res);
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • I wonder if this is as fast as it is elegant :) I will try this. I can skip `toupper` part, my hex strings are always lowercase. – Kamil Mar 21 '22 at 22:55
  • @Kamil Thanks! :-) I haven't benchmarked it but expect it to be decent. – Ted Lyngmo Mar 22 '22 at 05:57
1

If you really want to trim it down:

void htostr(char* hexstr, char* str) {
    int i = 0;

    while (hexstr[2*i]) {
    {
        str[i] = 0;
        for (int j=0; j<2; j++) {
            str[i] <<= 4;
            char c = hexstr[2*i+j];
            if (c >= '0' && c <= '9')  {
                str[i] |= c - '0';
            } else if (c >= 'A' && c <= 'F')  {
                str[i] |= c - 'A' + 10;
            } else if (c >= 'a' && c <= 'f')  {
                str[i] |= c - 'a' + 10;
            }
        }
        i++;
    }
}
dbush
  • 205,898
  • 23
  • 218
  • 273
  • Something like this. But there is also look-up table idea (@ChristianGibbons). Maybe that would be faster. – Kamil Mar 21 '22 at 22:42
1

There are many ways to do this and efficiently depends of typical string length, frequency of use, allowable memory footprint, etc.

Below is one that does the job fairly quick.

Loop though pairs of hex digits and compute the character code via table look-up.

#include <ctype.h>

static const unsigned char val[] = { //
    ['0'] = 0, ['1'] = 1, ['2'] = 2, ['3'] = 3, ['4'] = 4, //
    ['5'] = 5, ['6'] = 6, ['7'] = 7, ['8'] = 8, ['9'] = 9, //
    ['A'] = 10, ['B'] = 11, ['C'] = 12, ['D'] = 13, ['E'] = 14, ['F'] = 15, //
    ['a'] = 10, ['b'] = 11, ['c'] = 12, ['d'] = 13, ['e'] = 14, ['f'] = 15, //
};

void htostr_alt(const char* hexstr, char* str) {
  // Best to use is...() functions with unsigned char data
  const unsigned char *uhexstr = (const unsigned char *) hexstr; 

  while (isxdigit(uhexstr[0]) && isxdigit(uhexstr[1])) {
    *str++ = (char) (val[uhexstr[0]]*16u + uhexstr[uhexstr[1]]);
    uhexstr += 2;
  }
  *str = '\0';

  // Consider returning something useful, like where did input stop.
  // return (char *) uhexstr;
}

To avoid implementation defined behavior when assigning the character:

void htostr_alt2(const char* hexstr, char* str) {
  const unsigned char *uhexstr = (const unsigned char *) hexstr; 
  unsigned char *ustr = (const unsigned char *) str; 

  while (isxdigit(uhexstr[0]) && isxdigit(uhexstr[1])) {
    *ustr++ = (unsigned char) (val[uhexstr[0]]*16u + uhexstr[uhexstr[1]]);
    uhexstr += 2;
  }
  *ustr = '\0';
}

Code works even when string length more than INT_MAX, accepts a const input string, stops on any non-hex-digit pair and only 1 pass through the source string.

If you do not like the function isxdigit(), easy enough to code unsigned char my_isxdigit[256].

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

Assuming that you know the string format in advance and it's never more than 8 digits, then keep it simple. This is both efficient and readable:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

int main()
{
  char hstr[9] = "61626364";
  uint32_t n = strtoul(hstr, 0, 16);
  char str[5] = 
  {
    (n >> 24) & 0xFFu,
    (n >> 16) & 0xFFu,
    (n >>  8) & 0xFFu,
    (n >>  0) & 0xFFu,
    '\0'
  };
  puts(str);
}

As for manually rolling out hex string to integer conversion (I don't really see why you would in this case), the most efficient but slightly flash memory consuming code is this:

const uint8_t LUT[128] =
{
  ['0'] =  0, ['1'] =  1, /* and so on... */ 
  ['A'] = 10, ['B'] = 11, /* and so on... */
};

...
uint8_t val = LUT[str[i]];
Lundin
  • 195,001
  • 40
  • 254
  • 396