2

I want convert the characters in hex string

"0b7c28c9b7290c98d7438e70b3d3f7c848fbd7d1dc194ff83f4f7cc9b1378e98" 

to uint8_t msg[] and do not understand how to do it.

It seems simple, but have been unable to figure it out. I want to convert each character to a uint8_t hex value. For example if I have

string result = "0123456789abcdef";

How do I convert the string to:

uint8_t msg[] = "0123456789abcdef";
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
vbujym
  • 109
  • 2
  • 11
  • Convert to what? How exactly should the conversion be done? strtol is for a single number, but there exists no data type in C++ that can hold a large number like the one in your string. Should the conversion be on byte-per-byte basis or what? Please clarify. – Lundin Apr 01 '19 at 12:55
  • Your question may be a duplicate of the following, where the answer is perhaps not the most efficient or C++-style: https://stackoverflow.com/questions/17261798/converting-a-hex-string-to-a-byte-array – Inon Peled Apr 01 '19 at 12:56
  • 1
    The code seems to work ok: https://ideone.com/GpkqUT. What is your issue? – 001 Apr 01 '19 at 12:57
  • function accepts ...const uint8_t msg []... I only have a string hex symbol – vbujym Apr 01 '19 at 12:58
  • Possible duplicate of [Converting a hex string to a byte array](https://stackoverflow.com/questions/17261798/converting-a-hex-string-to-a-byte-array) – ShadowRanger Apr 01 '19 at 12:59
  • 1) I remember doing this with combination of `std::string::substr` (as you are doing), and `std::istringstream`, but your solution should work fine as well. 2) "_function accepts ...const uint8_t msg []... I only have a string hex symbol_" Which function? Yours accepts `const std::string&`. – Algirdas Preidžius Apr 01 '19 at 12:59
  • 2
    Is it the [`std::string::c_str`](https://en.cppreference.com/w/cpp/string/basic_string/c_str) function you're looking for? – Some programmer dude Apr 01 '19 at 13:05
  • Yes I think you're overdoing this, `c_str` puls a cast to `const uint8_t*`. It's a one-liner, no separate function needed. – john Apr 01 '19 at 13:06
  • I do not understand what I am explaining wrong there is a static void getHash function (const uint8_t msg [], size_t len, uint8_t hashResult [HASH_LEN]); which takes an array of unsigned characters, I can't just pass a string there! – vbujym Apr 01 '19 at 13:06
  • You can pass the contents of a string, the actual character array that the string wraps. – john Apr 01 '19 at 13:07
  • Your question doesn't really say what you're doing, what the actual problem is. You show us a solution that doesn't really solve your problem, so of course we attempt to guess what you want based on that information. You also show some unrelated equation that doesn't tell us anything. You don't tell us anything about the function you want to call, or what it *really* want (a *string* or an array of *bytes*, which are two different things). – Some programmer dude Apr 01 '19 at 13:07
  • @Someprogrammerdude result.c_str(): invalid conversion from 'const char*' to 'const uint8_t*' {aka 'const unsigned char*'} [-fpermissive] – vbujym Apr 01 '19 at 13:08
  • https://en.cppreference.com/w/cpp/container/vector/data – 001 Apr 01 '19 at 13:08
  • @vbujym So add a cast. In this case it seems to be what is required. – john Apr 01 '19 at 13:08
  • And please read about [how to ask good questions](http://stackoverflow.com/help/how-to-ask), as well as [this question checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). Lastly learn how to create a [mcve]. – Some programmer dude Apr 01 '19 at 13:09
  • 1
    @vbujym The fundamental question is does this function take character data, or integer data? I.e. how are the bytes you pass to the function going to be interpreted? You haven't actually said (maybe you don't know). – john Apr 01 '19 at 13:10
  • func: getHash(const uint8_t msg[], size_t len, uint8_t hashResult[HASH_LEN]) string result != uint8_t msg[] !!!!!!!! – vbujym Apr 01 '19 at 13:16
  • @john maybe I still have to dance with big and little endian – vbujym Apr 01 '19 at 13:18
  • But that doesn't help, especially in a comment. We need to know if the function expects a byte-array or a string. Right now it could be both, and even you seem to be wanting to pass a string to it. What documentation about the function do you have? And endianness is for multi-byte values, not for strings or single bytes. – Some programmer dude Apr 01 '19 at 13:18
  • And what does "32b - (64 + 1b) - 32b - 20b" have to do with it all? – Some programmer dude Apr 01 '19 at 13:20
  • @vbujym Funny, because `data` and `c_str` are the same. – john Apr 01 '19 at 13:37
  • @john In the comment above, I gave the output of the compiler error these methods are the same for strings and characters, but not for vector – vbujym Apr 01 '19 at 13:42
  • @johnny-mopp thanks! method data() helped me! thank you very much! – vbujym Apr 01 '19 at 13:45

2 Answers2

0

This func (thanks Converting a hex string to a byte array )

vector<uint8_t> HexToBytes(const string& hex) {
  vector<uint8_t> bytes;
  for (unsigned int i = 0; i < hex.length(); i += 2) {
    string byteString = hex.substr(i, 2);
    uint8_t byte = (uint8_t) strtol(byteString.c_str(), nullptr, 16);
    bytes.push_back(byte);
  }
  return bytes;
}

using the above function we get the vector of bytes and call the method data()

I want to thank the community now everything is working correctly. Thanks for the comments, I was already desperate that I could not do such a simple thing. Special thanks to @johnny-mopp

vbujym
  • 109
  • 2
  • 11
0

Edit - Updated to ready bytes not characters

Rather than using .substr() and calling C strtol and casting to uint8_t, you can simply use an istringstream along with std::setbase(16) to read the bytes as unsigned values directly into your vector<uint8_t> msg. See std::setbase.

For instance you can create an istringstream from your string containing the hex characters, and then along with your vector of uint8_t and a temporary unsigned to read directly into before pushing back into your vector you could do, e.g.

    std::string result ("0123456789abcdef");    /* input hex string */
    std::string s2;                             /* string for 2-chars */
    std::istringstream ss (result);             /* stringstream of result */
    std::vector<uint8_t> msg;                   /* vector of uint8_t */

    while ((ss >> std::setw(2) >> s2)) {    /* read 2-char at a time */
        unsigned u;                         /* tmp unsigned value */
        std::istringstream ss2 (s2);        /* create 2-char stringstream */
        ss2 >> std::setbase(16) >> u;       /* convert hex to unsigned */
        msg.push_back((uint8_t)u);          /* add value as uint8_t */
    }

In that way, each 2 characters in result read using std::setw(2) are used to create a 2-character stringstream that is then converted a an unsigned value using std::setbase(16). A complete example would be:

#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
#include <vector>

int main (void) {

    std::string result ("0123456789abcdef");    /* input hex string */
    std::string s2;                             /* string for 2-chars */
    std::istringstream ss (result);             /* stringstream of result */
    std::vector<uint8_t> msg;                   /* vector of uint8_t */

    while ((ss >> std::setw(2) >> s2)) {    /* read 2-char at a time */
        unsigned u;                         /* tmp unsigned value */
        std::istringstream ss2 (s2);        /* create 2-char stringstream */
        ss2 >> std::setbase(16) >> u;       /* convert hex to unsigned */
        msg.push_back((uint8_t)u);          /* add value as uint8_t */
    }

    std::cout << "string: " << result << "\nmsg: \n";
    for (auto& h : msg) /* for each element of msg, output hex value */
        std::cout << "\t" << std::setfill('0') << std::hex << std::setw(2) 
                    << (uint32_t)h << '\n';;
}

(note the cast required in the output to explicitly tell cout to treat the uint8_t value as an unsigned value rather than a uint8_t value which defaults to an character type by default.

Example Use/Output

$ ./bin/hexstr2uint8_t
string: 0123456789abcdef
msg:
        01
        23
        45
        67
        89
        ab
        cd
        ef

(note there are 8 uint8_t ("byte") values stored this time instead of 16 character values)

It's just an alternative using the C++ iostream features which avoids the need to cast things around rather than calling strtol directly (which in your case should probably be strtoul to begin with).

Manual Hex Conversion

In your last comment you indicate that using iostream and stringstream for the conversion is slow. You can attempt to optimize a bit by eliminating the stringstream and using a string::iterator to step through the string manually converting each character and forming each uint8_t byte as you go (protecting against a final nibble or 1/2-byte), e.g.

#include <iostream>
#include <iomanip>
#include <string>
#include <vector>

/* simple manual conversion of hexchar to value */
uint8_t c2hex (const char c)
{
    uint8_t u = 0;

    if ('0' <= c && c <= '9')
        u = c - '0';
    else if ('a' <= c && c <= 'f')
        u = c - 'W';
    else if ('A' <= c && c <= 'F')
        u = c - '7';
    else
        std::cerr << "error: invalid hex char '" << c << "'\n";

    return u;
}

int main (void) {

    std::string s ("0123456789abcdef");
    std::vector<uint8_t> msg;

    for (std::string::iterator n = s.begin(); n != s.end(); n += 2) {
        uint8_t u = c2hex (*n);             /* save high-nibble */
        if (n + 1 != s.end())               /* if low-nibble available */
            u = (u << 4) | c2hex (n[1]);    /* shift high left 4 & or */
        msg.push_back(u);                   /* store byte in msg */
    }

    std::cout << "string: " << s << "\nmsg:\n";
    for (auto& h : msg)
        std::cout << "\t" << std::setfill('0') << std::hex 
                    << std::setw(2) << (unsigned)h << '\n';
}

(output is the same as above)

If you can guarantee there will always be an even number of characters in your string (bytes only and no 1/2-byte as the final-odd character), you can further optimize by removing the conditional and simply using:

        uint8_t u = c2hex (n[1]) | (c2hex (*n) << 4);

Make sure you are compiling with full optimization, e.g. -O3 (or -Ofast gcc version >= 4.6) on gcc/clang and /Ox with VS.

Give that a try and compare performance, you can additionally dump the differing versions to assembly and see if there are any additional hints there.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • it looks better, but unfortunately it’s not the same thing. I need this function for hashing, and if you use your option, you get a different result my function gives the correct, I suppose it is in byte order, but I can not understand what is happening. i use this simple library https://github.com/okdshin/PicoSHA2 metod picosha2::hash256_hex_string(hash, result) string s = "0250863ad64a87ae8a2fe83c1af1a8403cb53f53e486d8511dad8a04887e5b2352" vector hash = HexToBytes(s); //my function result = 0b7c28c9b7290c98d7438e70b3d3f7c848fbd7d1dc194ff83f4f7cc9b1378e98 //it true – vbujym Apr 01 '19 at 18:50
  • no, your function gives characters, and mine gives bytes, that’s all the difference. I just understand that I do not understand anything at all – vbujym Apr 01 '19 at 19:57
  • OK, now I understand. It was confusing because every "byte", (e.g. `"1f"` is two "nibbles" or two ASCII characters). It is just as easy to change it to read bytes and not characters. All you do is change `std::setw(1)` to `std::setw(2)`. Sorry for the confusion. – David C. Rankin Apr 01 '19 at 20:12
  • You could help me with this, because I’m definitely already completely confused.I will be very grateful and grateful! setw(2) ? no, it's simple not work ! – vbujym Apr 01 '19 at 20:13
  • Sure, I just edited my answer. Each hex digit `0, 1, 2, 3 ... c, d, e, f` is a value which can be represented by 4-bits (called a "nibble"). There are 2-nibbles per byte. However, everything you type is a `character`, so the ASCII characters `0, 1, 2, 3 ... c, d, e, f` are represented by 7-bit values ranging from `32 - 126`. To store as `uint8_t` you must convert each ASCII character byte into a hex digit. Then since you can fit two hex-digits in every byte, you see each hex byte as `01, 23, .. ef`. (which does make it confusing) See [ASCIITable.com](http://www.asciitable.com/) – David C. Rankin Apr 01 '19 at 20:21
  • Thank you for the help and time spent, this function works correctly! but unfortunately it is not very fast. But thank you! In general, I am amazed that in C ++ there is no such function in the standard library. Working with bytes is the basis of the basics. Even in python there is a function bytes.fromhex()... – vbujym Apr 02 '19 at 18:54
  • If you want it fast, write it in C. It's actually easier and will be much faster. – David C. Rankin Apr 02 '19 at 21:41
  • @vbujym - see if the final addition with manual conversion doesn't speed things up a bit. Make sure you are compiling with full optimization, e.g. `-O3` (or `-Ofast`) on gcc/clang and `/Ox` with VS. – David C. Rankin Apr 02 '19 at 22:33