28

In C, what is the most efficient way to convert a string of hex digits into a binary unsigned int or unsigned long?

For example, if I have 0xFFFFFFFE, I want an int with the base10 value 4294967294.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847

16 Answers16

41

You want strtol or strtoul. See also the Unix man page

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Patrick
  • 90,362
  • 11
  • 51
  • 61
33

Edit: Now compatible with MSVC, C++ and non-GNU compilers (see end).

The question was "most efficient way." The OP doesn't specify platform, he could be compiling for a RISC based ATMEL chip with 256 bytes of flash storage for his code.

For the record, and for those (like me), who appreciate the difference between "the easiest way" and the "most efficient way", and who enjoy learning...

static const long hextable[] = {
   [0 ... 255] = -1, // bit aligned access into this table is considerably
   ['0'] = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, // faster for most modern processors,
   ['A'] = 10, 11, 12, 13, 14, 15,       // for the space conscious, reduce to
   ['a'] = 10, 11, 12, 13, 14, 15        // signed char.
};

/** 
 * @brief convert a hexidecimal string to a signed long
 * will not produce or process negative numbers except 
 * to signal error.
 * 
 * @param hex without decoration, case insensitive. 
 * 
 * @return -1 on error, or result (max (sizeof(long)*8)-1 bits)
 */
long hexdec(unsigned const char *hex) {
   long ret = 0; 
   while (*hex && ret >= 0) {
      ret = (ret << 4) | hextable[*hex++];
   }
   return ret; 
}

It requires no external libraries, and it should be blindingly fast. It handles uppercase, lowercase, invalid characters, odd-sized hex input (eg: 0xfff), and the maximum size is limited only by the compiler.

For non-GCC or C++ compilers or compilers that will not accept the fancy hextable declaration.

Replace the first statement with this (longer, but more conforming) version:

static const long hextable[] = { 
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1, 0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,-1,10,11,12,13,14,15,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
};
Orwellophile
  • 13,235
  • 3
  • 69
  • 45
  • Am I right in thinking the `hextable` initialisation code is pseudo-code (if so it's worth pointing that out), or is this some esoteric array initialisation syntax I'm not familiar with? – John Carter Nov 07 '12 at 01:02
  • It is not compiling with Android ndk-build. – hB0 Nov 08 '13 at 22:53
  • @hB0 i will respond to that incredibly vague and pointless observation by replying in kind: it compiles fine on clang. there are 22 warnings, but that's to be expected. – Orwellophile Nov 17 '13 at 14:45
  • I used ndk-build tool in android ndk - http://developer.android.com/tools/sdk/ndk/index.html and it does not compile gives me error specifically on the array declaration. Though I love the code fragment but I could not use it, so had to use other good method (but inefficient). Cant give you exact compilation error now.. (already gave you +1 last time) – hB0 Nov 19 '13 at 10:12
  • @hB0 just comment out second line of code with "[0..255]" in it, and pray that you're never passed invalid input – Orwellophile Nov 20 '13 at 12:38
  • You could shrink the table by using `hex - '0'` as your index, if you can assume that inputs are valid. Or maybe `hex & (~0x20)` to convert lowercase to uppercase (and also mangle `'0'-'9'`). Maybe some combination of two simple ALU operations before indexing... Also `int` would be a better choice than `long`. There's no need for 64bit entries on x86-64. (Usually there's no need for more than 8bit entries: most CPUs have efficient zero-extending byte loads, including x86 and ARM. Lack of support for unaligned word loads doesn't imply bad support for byte loads). – Peter Cordes May 10 '16 at 20:36
  • The entire point of x86_64 is that it supports native 64 bit word sizes. A general-purpose hex to word routine should support the largest word-size that the compiler supports, whether this is running on an 16-bit CPU or not. On most compilers, long corresponds to the largest native-word size. The register-size matching of the table itself is very much processor-specific as to performance gains. 2048 byte table size is irrespective of the cache lines that actually get used for valid input, such that '0'..'9' results in two 64 byte cache lines, and depending on case, an additional cache line. – Orwellophile Jun 09 '16 at 06:25
  • @PeterCordes (continuation of above comment) Reducing this to unaligned bytes may or may not improve performance but again that is very much processor and use-case specific, and the code as-is provides decent general performance. Regards 'assuming inputs are valid': the vast majority of C/C++ CVEs come from sentiment exactly like yours. Assuming the input is valid is surely NOT the most efficient way. Further, the signed long allows for err or checking the result (<0 obviously means an error in the input). – Orwellophile Jun 09 '16 at 06:27
  • Of course the routing should return `long` (or `intmax_t`, to get a type that I hope is 64bit on x86-64 Windows, because `long` is still 32bit there). Using table entries of that size is a totally separate decision, and that's what I was talking about, not the function return type. You can go either way on padding the table with usually-unused entries, but gaps in your actual working set can lead to more TLB misses if other decisions like this put your data across too many pages. But anyway, there's no reason for the hot entries in the LUT to span more than one cache line. – Peter Cordes Jun 09 '16 at 06:40
  • You left out the critical word "**if** you can assume inputs are valid". e.g. if you've already range-checked to see if you've reached the end of the number, which is something a routine like this probably does anyway. You have a good point that some future modification will happen that runs the code with invalid inputs without checking first, though. Or that the LUT can be used *as* the `ishex()` check. (In which case the size of the whole table matters, not just 0-9 and a-f. `int8_t` would be a good choice, to still support -1.). You are claiming to aim for "the most efficient way"... – Peter Cordes Jun 09 '16 at 06:46
  • @PeterCordes Your many points are not without validity, esp. that `long` is always 32 bits. The `typedef` (which would require including `stdint.h`) for `intmax_t` is `long long`, and I will fall back on the classic defense that the OP asked specifically for "unsigned int or unsigned long" (I guess I failed on the unsigned part). As for "most efficient", I loosely chose to define that as "the fastest code with the smallest footprint." That does of course leave one with a degree of personal judgement, e.g. _do I use an array of char and cast later_. I would suggest continuing this on godbolt – Orwellophile Feb 14 '17 at 10:52
  • https://godbolt.org/g/LoeRYU ... changing from `long long` to `char` for the table creates one extra line of assembly `movsx rdx, BYTE PTR hextable[rdx]`... should I dig up an arduino and see if there's a measurable difference? (well, ok, tbf I would use the char version on an arduino because memory is at a premium) :) – Orwellophile Feb 14 '17 at 11:04
  • For the record, this is NOT the most efficient. It is the fastest only if you do a lot of conversions in a row, and disregard the huge amount of space this takes up. – Alex May 16 '17 at 17:09
  • @Alex Why would anybody **not** doing a great deal of conversion, give a hoot about finding the most efficient algorithm? – Orwellophile Sep 25 '17 at 10:31
25

Try this:

#include <stdio.h>
int main()
{
    char s[] = "fffffffe";
    int x;
    sscanf(s, "%x", &x);
    printf("%u\n", x);
}
Jared Burrows
  • 54,294
  • 25
  • 151
  • 185
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
8

For AVR Microcontrollers I wrote the following function, including relevant comments to make it easy to understand:

/**
 * hex2int
 * take a hex string and convert it to a 32bit number (max 8 hex digits)
 */
uint32_t hex2int(char *hex) {
    uint32_t val = 0;
    while (*hex) {
        // get current character then increment
        char byte = *hex++; 
        // transform hex character to the 4bit equivalent number, using the ascii table indexes
        if (byte >= '0' && byte <= '9') byte = byte - '0';
        else if (byte >= 'a' && byte <='f') byte = byte - 'a' + 10;
        else if (byte >= 'A' && byte <='F') byte = byte - 'A' + 10;    
        // shift 4 to make space for new digit, and add the 4 bits of the new digit 
        val = (val << 4) | (byte & 0xF);
    }
    return val;
}

Example:

char *z ="82ABC1EF";
uint32_t x = hex2int(z);
printf("Number is [%X]\n", x);

Will output: enter image description here

radhoo
  • 2,877
  • 26
  • 27
7

If you don't have the stdlib then you have to do it manually.

unsigned long hex2int(char *a, unsigned int len)
{
    int i;
    unsigned long val = 0;

    for(i=0;i<len;i++)
       if(a[i] <= 57)
        val += (a[i]-48)*(1<<(4*(len-1-i)));
       else
        val += (a[i]-55)*(1<<(4*(len-1-i)));

    return val;
}

Note: This code assumes uppercase A-F. It does not work if len is beyond your longest integer 32 or 64bits, and there is no error trapping for illegal hex characters.

Markus Safar
  • 6,324
  • 5
  • 28
  • 44
5

As if often happens, your question suffers from a serious terminological error/ambiguity. In common speech it usually doesn't matter, but in the context of this specific problem it is critically important.

You see, there's no such thing as "hex value" and "decimal value" (or "hex number" and "decimal number"). "Hex" and "decimal" are properties of representations of values. Meanwhile, values (or numbers) by themselves have no representation, so they can't be "hex" or "decimal". For example, 0xF and 15 in C syntax are two different representations of the same number.

I would guess that your question, the way it is stated, suggests that you need to convert ASCII hex representation of a value (i.e. a string) into a ASCII decimal representation of a value (another string). One way to do that is to use an integer representation as an intermediate one: first, convert ASCII hex representation to an integer of sufficient size (using functions from strto... group, like strtol), then convert the integer into the ASCII decimal representation (using sprintf).

If that's not what you need to do, then you have to clarify your question, since it is impossible to figure it out from the way your question is formulated.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • I also read the question as hex string -> decimal string, but that doesn't match the other answers. I edited the question to match the accepted answer and most other answers. The string->string question is obscure, but makes me wonder if it can be done without going through a binary integer as an intermediate step (e.g. for numbers too big to fit in a `uint64_t`). add-with-carry down a string of decimal digits sucks a lot, though, so probably not. – Peter Cordes May 10 '16 at 20:28
3

As written before, the efficiency basically depends on what one is optimizing for.

When optiming for lines of code, or just working in environment without fully-equipped standard library one quick and dirty option could be:

// makes a number from two ascii hexa characters
int ahex2int(char a, char b){

    a = (a <= '9') ? a - '0' : (a & 0x7) + 9;
    b = (b <= '9') ? b - '0' : (b & 0x7) + 9;

    return (a << 4) + b;
}

... more in similar thread here: https://stackoverflow.com/a/58253380/5951263

Simon
  • 99
  • 1
  • 2
2

@Eric

Why is a code solution that works getting voted down? Sure, it's ugly and might not be the fastest way to do it, but it's more instructive that saying "strtol" or "sscanf". If you try it yourself you will learn something about how things happen under the hood.

I don't really think your solution should have been voted down, but my guess as to why it's happening is because it's less practical. The idea with voting is that the "best" answer will float to the top, and while your answer might be more instructive about what happens under the hood (or a way it might happen), it's definitely not the best way to parse hex numbers in a production system.

Again, I don't think there's anything wrong with your answer from an educational standpoint, and I certainly wouldn't (and didn't) vote it down. Don't get discouraged and stop posting just because some people didn't like one of your answers. It happens.

I doubt my answer makes you feel any better about yours being voted down, but I know it's especially not fun when you ask why something's being voted down and no one answers.

Community
  • 1
  • 1
Derek Park
  • 45,824
  • 15
  • 58
  • 76
2

@Eric

I was actually hoping to see a C wizard post something really cool, sort of like what I did but less verbose, while still doing it "manually".

Well, I'm no C guru, but here's what I came up with:

unsigned int parseHex(const char * str)
{
    unsigned int val = 0;
    char c;

    while(c = *str++)
    {
        val <<= 4;

        if (c >= '0' && c <= '9')
        {
            val += c & 0x0F;
            continue;
        }

        c &= 0xDF;
        if (c >= 'A' && c <= 'F')
        {
            val += (c & 0x07) + 9;
            continue;
        }

        errno = EINVAL;
        return 0;
    }

    return val;
}

I originally had more bitmasking going on instead of comparisons, but I seriously doubt bitmasking is any faster than comparison on modern hardware.

Derek Park
  • 45,824
  • 15
  • 58
  • 76
  • Four complaints: 1) It does not compile. 2) Id does not handle lower case 3) It does not work (A => 1). 4) Invalid characters are just ignored!. Did you test it? – Martin York Sep 26 '08 at 18:35
  • Did you read it? "I didn't actually compile this, so I could have made some pretty big mistakes." So no, I didn't test it. – Derek Park Sep 26 '08 at 18:46
  • There you go. I patched it up. For the record, it already handled lower case via the "c &= 0xDF" statement. It was broken in multiple other ways, though. – Derek Park Sep 26 '08 at 19:00
  • Fifth complaint: If you program in ANSI C (and are not guaranteed to have an ASCII-based execution character set), there is no guarantee that `'A' + 1 == 'B'` or that `('a' & 0xDF) == ('A' & 0xDF)`. – Roland Illig Jun 27 '10 at 16:39
2

For larger Hex strings like in the example I needed to use strtoul.

2

Hexadecimal to decimal. Don't run it on online compilers, because it won't work.

#include<stdio.h>
void main()
{
    unsigned int i;
    scanf("%x",&i);
    printf("%d",i);
}
Baz
  • 36,440
  • 11
  • 68
  • 94
1
#include "math.h"
#include "stdio.h"
///////////////////////////////////////////////////////////////
//  The bits arg represents the bit say:8,16,32...                                                                                                              
/////////////////////////////////////////////////////////////
volatile long Hex_To_Int(long Hex,char bits)
{
    long Hex_2_Int;
    char byte;
    Hex_2_Int=0;

    for(byte=0;byte<bits;byte++)
    {
        if(Hex&(0x0001<<byte))
            Hex_2_Int+=1*(pow(2,byte));
        else
            Hex_2_Int+=0*(pow(2,byte));
    }

    return Hex_2_Int;
}
///////////////////////////////////////////////////////////////
//                                                                                                                  
/////////////////////////////////////////////////////////////

void main (void)
{
    int Dec;   
    char Hex=0xFA;
    Dec= Hex_To_Int(Hex,8);  //convert an 8-bis hexadecimal value to a number in base 10
    printf("the number is %d",Dec);
}
Markus Safar
  • 6,324
  • 5
  • 28
  • 44
  • The code converts the hexadecemal to it decemal... no complex coding.. simple but works. – Sunday Efeh Aug 04 '12 at 13:56
  • 2
    My god, this is probably the worst implementation of hex to dec I've ever seen. `pow` seriously? Know that it is often implemented as `pow(a,b) = exp( b * log(a) )`. But even if not, converting integer to double is already a heavy operation, especially on modern processors. – Patrick Schlüter Nov 05 '12 at 22:32
  • Note that `Hex_To_Int` takes its input as a base-2 bitstring stored in a `long`. The hex-string to integer conversion happens at compile time! The "conversion" is just a very expensive no-op that could be written better as `return Hex`. – Peter Cordes May 10 '16 at 20:50
1

Why is a code solution that works getting voted down? Sure, it's ugly ...

Perhaps because as well as being ugly it isn't educational and doesn't work. Also, I suspect that like me, most people don't have the power to edit at present (and judging by the rank needed - never will).

The use of an array can be good for efficiency, but that's not mentioned in this code. It also takes no account of upper and lower case so it does not work for the example supplied in the question. FFFFFFFE

itj
  • 1,165
  • 1
  • 11
  • 16
0

Try this to Convert from Decimal to Hex

    #include<stdio.h>
    #include<conio.h>

    int main(void)
    {
      int count=0,digit,n,i=0;
      int hex[5];
      clrscr();
      printf("enter a number   ");
      scanf("%d",&n);

      if(n<10)
      {
          printf("%d",n);
      }

      switch(n)
      {
          case 10:
              printf("A");
            break;
          case 11:
              printf("B");
            break;
          case 12:
              printf("B");
            break;
          case 13:
              printf("C");
            break;
          case 14:
              printf("D");
            break;
          case 15:
              printf("E");
            break;
          case 16:
              printf("F");
            break;
          default:;
       }

       while(n>16)
       {
          digit=n%16;
          hex[i]=digit;
          i++;
          count++;
          n=n/16;
       }

       hex[i]=n;

       for(i=count;i>=0;i--)
       {
          switch(hex[i])
          {
             case 10:
                 printf("A");
               break;
             case 11:
                 printf("B");
               break;
             case 12:
                 printf("C");
               break;
             case  13:
                 printf("D");
               break;
             case 14:
                 printf("E");
               break;
             case 15:
                 printf("F");
               break;
             default:
                 printf("%d",hex[i]);
          }
    }

    getch();

    return 0;
}
Markus Safar
  • 6,324
  • 5
  • 28
  • 44
  • Consider changing the `void main()` to `int main(void)` –  Dec 24 '13 at 06:37
  • decimal->hex is easier: you can use a table lookup to convert from a 4bit integer to a hex digit, without a giant `switch`. `char hextable[] = { '0', '1', ..., 'A', 'B', ..., 'F' };` And you can use `putchar` instead of `printf`! Also, your first switch has a bug: `"B"` is there for 11 and 12, which is why 16-> `"F"`. /facepalm – Peter Cordes May 10 '16 at 20:44
-2

In C you can convert a hexadecimal number to decimal in many ways. One way is to cast the hexadecimal number to an integer. I personally found this to be simple and small.

Here is an sample code for converting a Hexadecimal number to a Decimal number with the help of casting.

#include <stdio.h>

int main(){
    unsigned char Hexadecimal = 0x6D;   //example hex number
    int Decimal = 0;    //decimal number initialized to 0


        Decimal = (int) Hexadecimal;  //conversion

    printf("The decimal number is %d\n", Decimal);  //output
    return 0;
}
-4

This currently only works with lower case but its super easy to make it work with both.

cout << "\nEnter a hexadecimal number: ";
cin >> hexNumber;
orighex = hexNumber;

strlength = hexNumber.length();

for (i=0;i<strlength;i++)
{
    hexa = hexNumber.substr(i,1);
    if ((hexa>="0") && (hexa<="9"))
    {
        //cout << "This is a numerical value.\n";
    }
    else
    {
        //cout << "This is a alpabetical value.\n";
        if (hexa=="a"){hexa="10";}
        else if (hexa=="b"){hexa="11";}
        else if (hexa=="c"){hexa="12";}
        else if (hexa=="d"){hexa="13";}
        else if (hexa=="e"){hexa="14";}
        else if (hexa=="f"){hexa="15";}
        else{cout << "INVALID ENTRY! ANSWER WONT BE CORRECT\n";}
    }
    //convert from string to integer

    hx = atoi(hexa.c_str());
    finalhex = finalhex + (hx*pow(16.0,strlength-i-1));
}
cout << "The hexadecimal number: " << orighex << " is " << finalhex << " in decimal.\n";
BenMorel
  • 34,448
  • 50
  • 182
  • 322