Converting binary data to printable hex

Question

In this thread some one commented that the following code should only be used in 'toy' projects. Unfortunately he hasn't come back to say why it's not of production quality so I was hoping some one in the community may be able to either assure me the code is ok (because I quite like it) or identify what is wrong.

template< class T1, class T2>
void hexascii( T1& out, const T2& in )
{
    out.resize( in.size() * 2 );
    const char hexDigits[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); ++it )
    {
        *outit++ = hexDigits[*it >> 4];
        *outit++ = hexDigits[*it & 0xF];
    }
}

template<class T1, class T2>
void asciihex( T1& out, const T2& in )
{
    size_t size = in.size;
    assert( !(size % 2) );

    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
                (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f);
    }
}

Edit: Thanks for your help guys, you've made some big improvements. I've written functions in the two suggested styles from your answers. Some rough testing suggests the second method is marginally faster than the first, but IMO this is outweighed by the improved readability of the first.

template<class T1>
void asciihex2( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( size_t i = 0; i < in.size(); i += 2 )
    {
        int tmp;
        sscanf( in.c_str() + i, "%02X", &tmp );
        *outit++ = tmp;
    }
}

template<class T1>
void asciihex3( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
const char hexDigits[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
                          0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
                  0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F};
for( std::string::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = (hexDigits[(*it - 0x30) & 0x1f] << 4) + 
              hexDigits[((*(it+1) - 0x30) & 0x1f)];
    }
}

Some of the assumptions surronding this code: 1: They are not intended as a generic, but are used in an anonymous name space to translate data for a specific class. 2: The templating is required as two separate container types are being used (one being std::vector, the other a similar byte array type container from a third party library. 3: The purpose is to be able to convert binary data of indeterminate length into strings and back again (0x1234abcd <-> "1234abcd") 4: assert traps errors in both debug and release modes 5: by the time these functions are called the size of the string will already have been checked, assert is used to terminate processing if something serious has gone wrong 6: It needs some commenting

Any other ideas appreciated.

Nothing against your question Patrick, but the title is well vague, unfortunately I can't think of anything more accurate and meaningful, but it'd be nice if someone had a good idea, after all the question could mean literally anything. — Robert Gould, Jan 05 '09 at 13:02
Shouldn't the first line of asciihex should be: size_t size = in.size(); — jrb, Jan 05 '09 at 13:07
I had the same thoughts about the question Robert, but also couldn't think of anything concise and meaningful. — Patrick, Jan 05 '09 at 13:13
Perhaps HexToBinary and BinaryToHex would be more meaningful names, the term ASCII here is misleading, particularly since the use of templates indicates possible unicode. Otherwise the hex is ASCII. — SmacL, Jan 05 '09 at 14:10
Please try to make a title that is specific to your question and now, your answer. No one will ever be able to accurately search for this — mmcdole, Jan 05 '09 at 23:10
Perhaps, Why shouldn't I use these functions to do hex conversion in C? — mmcdole, Jan 05 '09 at 23:16
@Patrick: assert can be disabled by defining NDEBUG, which many folks do for production code. — Bklyn, Jan 09 '09 at 22:02

score 8 · Accepted Answer · answered Jan 05 '09 at 12:32

8

It seems like a lot of templated code to achieve very little, given you have direct hex conversion in the standard C scanf and printf functions. why bother?

answered Jan 05 '09 at 12:32

SmacL

22,555
12
95
149

I was thinking the same thing. – Daniel Sloof Jan 05 '09 at 12:33
sscanf/printf are not type-safe. The stream << and >> operators are. – xtofl Jan 05 '09 at 12:37
1

True, but you could wrap sscanf and sprintf with very simple functions to achieve the type safety. The above templated code is way over the top for something this simple, and the use of templates incorrectly suggests the code is type friendly. – SmacL Jan 05 '09 at 12:51
I think this answer comes from misunderstanding the purpose of the functions (the mistake being in the question, not your reading of it - I've added a bit to the question). I can't see any 'nice' way of using printf for the purpose of changing a byte array into printable ascii and back. – Patrick Jan 05 '09 at 13:44
In fact I can't see how to turn 0x30 into 0x00 using sprintf at all. Do you agree or am I being dumb? – Patrick Jan 05 '09 at 13:48
check out the printf %x qualifier, in the following link, http://www.cplusplus.com/reference/clibrary/cstdio/printf.html, e.g. sprintf(str,"%02x",i). Similarly for sscanf http://www.cplusplus.com/reference/clibrary/cstdio/scanf.html, as in sscanf("%x",&i); – SmacL Jan 05 '09 at 14:05
Which turns 0x00 into 0x30 but how to do the reverse? – Patrick Jan 05 '09 at 14:25
given char s[3], int i = 15, sprintf(s,"%02x",i) sets s to "0f" now say s = "fe", sscanf(s,"%x",&i) sets i to 254. – SmacL Jan 05 '09 at 14:36
Printf is great but there are times when one might need functions like this in an environment where sscanf/printf are not desired (or perhaps not available): embedded, low-level debugging, etc. – jwfearn Jan 05 '09 at 15:53
@jwfearn True, but there are still better ways that the example above. Particuarly in embedded where resources are an issue, I would tend to go for a streamed output rather than resizing a buffer. – SmacL Jan 05 '09 at 16:20
@smacl: what if s = "12abdc34d5ba990345326234656". When I code this with scanf it just looks very wrong. – Patrick Jan 05 '09 at 16:56
@smacl: don't resize a buffer in embeded due to resources? I'm confused. A stream provides a neat API for a buffer but the stream still holds the data in a buffer. – Patrick Jan 05 '09 at 16:59
@smacl: ignore the scanf looks wrong comment. I just needed to think about it for a bit – Patrick Jan 05 '09 at 17:11

score 6 · Answer 2 · answered Jan 05 '09 at 12:35

6

My main comment about it is that it's very difficult to read.

Especially:

*outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
            (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f)

It would take my brain a little while to grok that, and annoy me if I inherited the code.

answered Jan 05 '09 at 12:35

unknownuser

790
9
16

Likewise, confusing for sure. – SmacL Jan 05 '09 at 12:56
"A little while"? Either genius or understatement. – Anthony Jan 05 '09 at 13:01
That line defintely needs a comment but I can't see anyway of simplifying it once this approach to conversion is chosen. Splitting it over multiple lines... – Patrick Jan 05 '09 at 13:18
Agreed, it's not clear how to improve it. Possible moving some of the checks into submethods. Still, it makes my eyes bug out. – unknownuser Jan 05 '09 at 13:26
I'd certainly replace "(*it > '9' ? *it - 0x07 : *it) - 0x30)" with an inline function 'getHexValue', and call that twice. I'm also unsure of the value of the masks: if the input string is garbage, who cares what value is output? If the input string is valid, the masks have no effect. – Steve Jessop Jan 06 '09 at 14:10
The result is "*outit = (getHexValue(it) << 4) + getHexValue(it+1);", which I'd settle for. – Steve Jessop Jan 06 '09 at 14:16

Stephan Eggermont · Answer 3 · 2009-01-05T14:18:43.140

4

What is it supposed to do? There is no well-known accepted meaning of hexascii or asciihex, so the names should change.

[edit] Converting from binary to hex notation should often not be called ascii..., as ascii is a 7-bit format.

edited Jan 05 '09 at 14:18

answered Jan 05 '09 at 12:41

Stephan Eggermont

15,847
1
38
65

Good call, I only just noticed that now. At a glance, I assumed it was a function for converting between various types of integers and hex. Re-reading it appears to be for converting between blocks of binary (in bytes) and hex. – SmacL Jan 05 '09 at 12:59
Edited the question, change a binary array into printable format and back again. – Patrick Jan 05 '09 at 13:31

score 3 · Answer 4 · edited May 23 '17 at 11:48

3

I don't really object against it. It's generic (within limits), it uses consts, references where needed, etc... It lacks a bit of documentation, and the asciihex *outit assignment is not quite clear at first sight.

resize initializes the output's elements unnecessary (use reserve instead).

Maybe the genericity is somewhat too flexible: you can feed the algorithms with any datatype you like, while you should only give it hex numbers (not e.g. a vector of doubles)

And indeed, it may be a bit overkill, given the presence of good library functions.

edited May 23 '17 at 11:48

Community

1
1

answered Jan 05 '09 at 12:39

xtofl

40,723
12
105
192

The code will not work if you use reserve. If out.size() is 0 before you call reserve, it will still be 0 after you call reserve, so the loops wouldn't execute. See http://www.gotw.ca/gotw/074.htm – Joel Jan 05 '09 at 17:59
Indeed, you would need additional adaptations. My point was that objects got constructed unnecessarily. – xtofl Jan 07 '09 at 09:55

score 3 · Answer 5 · answered Jan 05 '09 at 12:47

3

What's wrong with

*outit = hexDigits[*it]

Why can't these two functions share a common list of hexDigits and eliminate the complex (and slow) calculation of an ASCII character?

answered Jan 05 '09 at 12:47

S.Lott

384,516
81
508
779

How do you link 0x30 to 0x00? You could use a map I suppose but seems like overkill. – Patrick Jan 05 '09 at 13:28
The map is a trivial array lookup. There will only be a few bytes (256 in the crazy worst case; 128 is more realistic). And it will perform instantly using an add and a multiply and nothing more. – S.Lott Jan 05 '09 at 13:50

Thorsten79 · Answer 6 · 2009-01-05T14:17:45.717

3

Code has assert statements instead of proper handling of an error condition (and if your assert is turned off, the code may blow up)
for loop has dangerous double-increase of iterator (it+=2). Especially in case your assert did not fire. What happens when your iterator is already at the end and you ++ it?
Code is templated, but what you're doing is simply converting characters to numbers or the other way round. It's cargo cult programming. You hope that the blessings of template programming will come upon you by using templates. You even tagged this as a template question although the template aspect is completely irrelevant in your functions.
the *outit= line is too complicated.
code reinvents the wheel. In a big way.

edited Jan 05 '09 at 14:17

answered Jan 05 '09 at 14:05

Thorsten79

10,038
6
38
54

Thanks Thorsten, I was hoping you'd see this. On the cargo cult point, the code allows me to process 2 containers from different libraries without having to write 2 functions which differ only in their parameters, one of the purposes of templates. I didn't consider all the other containers :(. – Patrick Jan 05 '09 at 14:44
I value your thoughts about being interoperable between libraries. But why are you fiddling with character constants and obscure ASCII values? It's code "halfway between the gutter and the stars". It tries to be elegant but fails to deliver on that promise. You could easily use C++ strings here. – Thorsten79 Jan 05 '09 at 15:47

Stephen Doyle · Answer 7 · 2009-01-05T16:40:06.927

Some problems that I see:

This will work great if it is only used for an input container that stores 8 bit types - e.g. char or unsigned char. For example, the following code will fail if used with a 32 bit type whose value after the right shift is greater than 15 - recommend that you always use a mask to ensure that lookup index is always within range.

*outit++ = hexDigits[*it >> 4];

What is the expected behavior if you pass in a container containing unsigned longs - for this to be a generic class it should probably be able to handle the conversion of 32 bit numbers to hext strings also.

This only works when the input is a container - what if I just want to convert a single byte? A suggestion here is to refactor the code into a core function that can covert a single byte (hex=>ascii and ascii=>hex) and then provide additional functions to use this core function for coverting containers of bytes etc.

In asciihex(), bad things will happen if the size of the input container is not divisible by 2. The use of:

it != in.end(); it += 2

is dangerous since if the container size is not divisible by 2 then the increment by two will advance the iterator past the end of the container and the comparison against end() will never work. This is somewhat protected against via the assert call but assert can be compiled out (e.g. it is often compiled out in release builds) so it would be much better to make this an if statement.

The assert on the second line of asciihex() checks that the input size is divisible by 2, so +=2 is safe in this case - I agree though that it at least looks dangerous, and I think I'd code it a bit differently myself. — jrb, Jan 05 '09 at 13:11
@jrbushell, assert will only be included in debug versions. It has no effect on release code, and is an aid to testing rather than run-time checking — SmacL, Jan 05 '09 at 13:20
assert doesn't only check in debug code in our build environment(though you can configure it to do so) — Patrick, Jan 05 '09 at 13:27

score 1 · Answer 8 · answered Jan 05 '09 at 16:07

Problems I spot:

hexascii does not check if sizeof(T2::value_type)==1

hexascii dereferences it twice, asciihex even more. There's no reason for this, as you can store the result. This means you can't use an istream_iterator.

asciihex needs a random iterator as input, because (it+1) and (it+=2) are used. The algorithm could work on a forward iterator if you use only (++it).

(*it > '9' ? *it - 0x07 : *it) - 0x30 can be simplified to *it - (*it > '9' ? 0x37 : 0x30) so there is only one unconditional subtraction left. Still, an array lookup would be more efficient. Subtract 0x30. '0' will become 0;'A' will become 0x11 and 'a' will become 0x31. Mask with 0x1f to make it case-insensitive, and you can do the resulting lookup in a char[0x20] without overflow risks. Non-hex chars will just give you weird values.

score 0 · Answer 9 · answered Jan 05 '09 at 16:03

0

The reason I would consider it toy code is there is no error checking.

I could pass it two vector and it would happily try and do something and make a complete mess generating random gibberish.

answered Jan 05 '09 at 16:03

Martin York

257,169
86
333
562

Converting binary data to printable hex

9 Answers9

Linked