21

Say I have a string like:

string hex = "48656c6c6f";

Where every two characters correspond to the hex representation of their ASCII, value, eg:

0x48 0x65 0x6c 0x6c 0x6f = "Hello"

So how can I get "hello" from "48656c6c6f" without having to create a lookup ASCII table? atoi() obviously won't work here.

NullUserException
  • 83,810
  • 28
  • 209
  • 234
  • Related, see [Convert hexadecimal string with leading “0x” to signed short in C++?](http://stackoverflow.com/q/1487440/608639) – jww May 07 '17 at 04:54

4 Answers4

27
int len = hex.length();
std::string newString;
for(int i=0; i< len; i+=2)
{
    std::string byte = hex.substr(i,2);
    char chr = (char) (int)strtol(byte.c_str(), null, 16);
    newString.push_back(chr);
}
SSpoke
  • 5,656
  • 10
  • 72
  • 124
James Curran
  • 101,701
  • 37
  • 181
  • 258
  • I'd go with this answer, as it wont depend on integer lengths – Xzhsh Sep 24 '10 at 20:18
  • Storing a length in an `int`. Now why would you do that? – sbi Sep 24 '10 at 20:24
  • 1
    @sbi: If I didn't, it would call string::length() every time through the loop. Since I know it's going to remain constant, no need going through the extra work. (Unless you are questioning my choice of int over say long -- because I couldn't see this as being practical on a string longer than that which would fit into an int length) – James Curran Sep 24 '10 at 20:30
  • @James Is `string::length()` O(1)? – NullUserException Sep 24 '10 at 20:38
  • @James: The length of a `std::string` is to be stored in `std::string::size_type`. The C lib uses `std::size_t` for this. – sbi Sep 24 '10 at 20:51
  • @NullUSerException: I'm not sure if that's a requirement or not, but regardless, O(1) is not O(0). – James Curran Sep 24 '10 at 20:56
  • This is *ten times* slower than it needs to be (see my answer). – zwol Sep 24 '10 at 23:30
  • 2
    I agree with sbi: use size_t for this. Your IDE/compiler should flag this as a warning anyway. –  Sep 25 '10 at 01:50
  • "Since I know it's going to remain constant" any serious compiler should know that, and LICM the length check out of the loop, since `hex` is only read from. Even Javascript JITs have been doing it since before this answer was posted. – Masklinn Dec 13 '22 at 12:15
  • @Masklinn - Actually, we call a member function on `hex`, so the compiler would have to know the `std::string::substr()` does not affect the length of the string. In fact, it would have to know that every call to the member function `length()` returns the same value. Now, since `string` is part of the standard library, the compiler *could* know these things, but it's not guaranteed, and technically, out of its scope to make those assumptions. – James Curran Feb 01 '23 at 20:42
26

Hex digits are very easy to convert to binary:

// C++98 guarantees that '0', '1', ... '9' are consecutive.
// It only guarantees that 'a' ... 'f' and 'A' ... 'F' are
// in increasing order, but the only two alternative encodings
// of the basic source character set that are still used by
// anyone today (ASCII and EBCDIC) make them consecutive.
unsigned char hexval(unsigned char c)
{
    if ('0' <= c && c <= '9')
        return c - '0';
    else if ('a' <= c && c <= 'f')
        return c - 'a' + 10;
    else if ('A' <= c && c <= 'F')
        return c - 'A' + 10;
    else abort();
}

So to do the whole string looks something like this:

void hex2ascii(const string& in, string& out)
{
    out.clear();
    out.reserve(in.length() / 2);
    for (string::const_iterator p = in.begin(); p != in.end(); p++)
    {
       unsigned char c = hexval(*p);
       p++;
       if (p == in.end()) break; // incomplete last digit - should report error
       c = (c << 4) + hexval(*p); // + takes precedence over <<
       out.push_back(c);
    }
}

You might reasonably ask why one would do it this way when there's strtol, and using it is significantly less code (as in James Curran's answer). Well, that approach is a full decimal order of magnitude slower, because it copies each two-byte chunk (possibly allocating heap memory to do so) and then invokes a general text-to-number conversion routine that cannot be written as efficiently as the specialized code above. Christian's approach (using istringstream) is five times slower than that. Here's a benchmark plot - you can tell the difference even with a tiny block of data to decode, and it becomes blatant as the differences get larger. (Note that both axes are on a log scale.)

Benchmark comparison plot

Is this premature optimization? Hell no. This is the kind of operation that gets shoved in a library routine, forgotten about, and then called thousands of times a second. It needs to scream. I worked on a project a few years back that made very heavy use of SHA1 checksums internally -- we got 10-20% speedups on common operations by storing them as raw bytes instead of hex, converting only when we had to show them to the user -- and that was with conversion functions that had already been tuned to death. One might honestly prefer brevity to performance here, depending on what the larger task is, but if so, why on earth are you coding in C++?

Also, from a pedagogical perspective, I think it's useful to show hand-coded examples for this kind of problem; it reveals more about what the computer has to do.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 1
    -1: ignored standard library facilities. I'd take of an extra point because they were mentioned in previous posts. – André Caron Sep 24 '10 at 21:24
  • 1
    I ignored them because they're ten times slower than doing it by hand. See edit. – zwol Sep 24 '10 at 23:13
  • 1
    +1 for focusing on performance, since this is likely to be something used frequently, like in a performance-critical loop. Also for realizing it is not the "correct" textbook answer, and recommending hiding it behind a library function. That's the best place for ugly code like this: behind a pretty interface. –  Sep 25 '10 at 01:53
  • +1 for the benchmark; though I think James and Christian's implementations are fine for where I am going to use this. – NullUserException Sep 27 '10 at 16:15
7
std::string str("48656c6c6f");
std::string res;
res.reserve(str.size() / 2);
for (int i = 0; i < str.size(); i += 2)
{
    std::istringstream iss(str.substr(i, 2));
    int temp;
    iss >> std::hex >> temp;
    res += static_cast<char>(temp);
}
std::cout << res;
Christian Ammer
  • 7,464
  • 6
  • 51
  • 108
  • Could pre-allocate, length is known in advance! – André Caron Sep 24 '10 at 21:23
  • I hate to say it, but this is five times slower than the accepted answer (which is itself ten times slower than my answer). – zwol Sep 24 '10 at 23:29
  • @Zack: I edited my answer and pre-allocate the size of the output string like Caron mentioned and you already did. I'm interested what now are the performance differences between my solution and yours? – Christian Ammer Sep 25 '10 at 06:15
  • My benchmark program preallocated the output string for all three cases; the only difference was the code inside the for-loop. – zwol Sep 25 '10 at 16:37
0

strtol should do the job if you add 0x to each hex digit pair.

schnaader
  • 49,103
  • 10
  • 104
  • 136
  • 1
    You don't need the 0x, you just need to pass 16 as the third argument. But strtol is massive overkill for this job... – zwol Sep 24 '10 at 20:26
  • Ah, I see. Yeah, only wanted to give a quick response, this is of course not the best solution (especially for C++). – schnaader Sep 24 '10 at 20:31