0

I have a string vector that holds some values. These values are supposed to be hex bytes but are being stored as strings inside this vector. The bytes were read from inside a text file actually, something like this:

(contents of the text file)

<jpeg1>
0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60
</jpeg1>

so far, what my code does is, it starts reading the line after the {JPEG1} tag until the {/jpeg1} tag and then using the comma ',' as a delimeter it stores the bytes into the string vector.

After Splitting the string, the vector at the moment stores the values like this :

vector<string> myString = {"0xFF", "0xD8", "0xFF", "0xE0", "0x00", "0x10", "0x4A", "0x46", "0x49", "0x46", "0x00", "0x01", "0x01", "0x01", "0x00", "0x60"};

        and if i print this i get the following:
            0: 0xFF
            1: 0xD8
            2: 0xFF
            3: 0xE0
            4: 0x00
            5: 0x10
            6: 0x4A
            7: 0x46
            8: 0x49
            9: 0x46

What I would want is that, I'd like to store these bytes inside an unsigned char array, such that each element be treated as a HEX byte and not a string value.

Preferably something like this :

     unsigned char myHexArray[] = {0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60};

        if i print this i get:
            0:  
            1: ╪
            2:  
            3: α
            4:
            5: 
            6: J
            7: F
            8: I
            9: F

Solved!
Thanks for your help guys, so far "ranban282" solution has worked for me, I'll try solutions provided by other users as well.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
  • Do you need the vector of strings in the first place? – n. m. could be an AI Apr 18 '17 at 06:21
  • Asked like this, it's a duplicate of http://stackoverflow.com/questions/1070497/c-convert-hex-string-to-signed-integer . – atlaste Apr 18 '17 at 06:21
  • you might even extract the textual source from between the tags and include it in the C++ source ... – Hagen von Eitzen Apr 18 '17 at 06:22
  • @n.m no its not necessary. Im using vectors because the function that im using (copied from stackoverflow) to split the string uses vectors. – erik.martin Apr 18 '17 at 06:27
  • Eventually what is required is, read bytes from the textfile and store them into a unsigned Char Array of some sort. :) – erik.martin Apr 18 '17 at 06:28
  • You have copied the wrong thing then. You want to read a comma-delimited sequence of integers from a stream. Don't try to learn C++ from examples on stackoverflow, it's a way to nowhere. – n. m. could be an AI Apr 18 '17 at 06:41
  • @n.m , was short on time, had to find a solution. but i couldnt agree more, its a way to nowhere without really having an understanding of the language itself. thanks! – erik.martin Apr 18 '17 at 07:57

4 Answers4

2

I wouldn't even go through the std::vector<std::string> stage, you don't need it and it wastes a lot of allocations for no good reason; just parse the string to bytes "online".

If you already have an istream for your data, you can parse it straight from it, although I had terrible experiences about performance for it.

// is is some derived class of std::istream
std::vector<unsigned char> ret;
while(is) {
    int val = 0;
    is>>std::hex>>val;
    if(!is) {
        break; // failed conversion; remember to clean up the stream
               // if you need it later!
    }
    ret.push_back(val);
    if(is.getc()!=',') break;
}

If instead you have it in a string - as often happens when extracting data from an XML file, you can parse it either using istringstream and the code above (one extra string copy + generally quite slow), or parse it straight from the string using e.g. sscanf with %i; say that your string is in a const char *sz:

std::vector<unsigned char> ret;
for(; *sz; ++sz) {
    int read = 0;
    int val = 0;
    if(sscanf(sz, " %i %n", &val, &read)==0) break; // format error
    ret.push_back(val):
    sz += read;
    if(*sz && *sz != ',') break; // format error
} 
// now ret contains the decoded string

If you are sure that the strings are always hexadecimal, regardless of the 0x prefix, and that whitespace is not present strtol is a bit more efficient and IMO nicer to use:

std::vector<unsigned char> ret;
for( ;*sz;++sz) {
    char *endp;
    long val = strtol(sz, &endp, 16);
    if(endp==sz) break; // format error
    sz = endp;
    ret.push_back(val);
    if(*sz && *sz!=',') break; // format error
}

If C++17 is available, you can use std::from_chars instead of strtol to cut out the locale bullshit, which can break your parsing function (although that's more typical for floating point parsing) and slow it down for no good reason.

OTOH, if the performance is critical but from_chars is not available (or if it's available but you measured that it's slow), it may be advantageous to hand roll the whole parser.

auto conv_digit = [](char c) -> int {
    if(c>='0' && c<='9') return c-'0';
    // notice: technically not guaranteed to work;
    // in practice it'll work on anything that doesn't use EBCDIC
    if(c>='A' && c<='F') return c-'A'+10;
    if(c>='a' && c<='f') return c-'a'+10;
    return -1;
};
std::vector<unsigned char> ret;
for(; *sz; ++sz) {
    while(*sz == ' ') ++sz;
    if(*sz!='0' || sz[1]!='x' || sz[1]!='X') break; // format error
    sz+=2;
    int val = 0;
    int digit = -1;
    const char *sz_before = sz;
    while((digit = conv_digit(*sz)) >= 0) {
        val=val*16+digit; // or, if you prefer: val = val<<4 | digit;
        ++sz;
    }
    if(sz==sz_before) break; // format error
    ret.push_back(val);
    while(*sz == ' ') ++sz;
    if(*sz && *sz!=',') break; // format error
}
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • @n.m.: what would be the C++ idiomatic way to handle a parsing problem? `std::istringstream`? `boost::spirit`? `std::locale::use_face::some_other_ridicolous_function_name_that_ultimately_calls_sscanf`? Don't make me laugh... If the "C++ way" is a regression over the C way let's keep the old one. – Matteo Italia Apr 18 '17 at 06:55
  • (the only better way I see to handle this task is actually `strtol`, although it doesn't have the "whatever base for free" benefit as `%i` or, if speed is *really* important and we can cut on the locale bullshit, a hand-rolled parser) – Matteo Italia Apr 18 '17 at 06:57
  • Why is `ret` of type `std::string` as oppose to using `std::vector`? – Jonas Apr 18 '17 at 07:06
  • `std::istringstream` would be the easy C++ way, `boost::spirit` would probably get you more error checks for free, I have no idea why bring facets and locales to the picture. – n. m. could be an AI Apr 18 '17 at 07:26
  • Anyway the answer has much more code now, so the comment is not that relevant. – n. m. could be an AI Apr 18 '17 at 07:37
1

If you're using C++11, you can use the stoi function.

vector<string> myString = {"0xFF", "0xD8", "0xFF", "0xE0", "0x00", "0x10", "0x4A", "0x46", "0x49", "0x46", "0x00", "0x01", "0x01", "0x01", "0x00", "0x60"};
    unsigned char* myHexArray=new unsigned char[myString.size()];
    for (unsigned  i=0;i<myString.size();i++)
    {
            myHexArray[i]=stoi(myString[i],NULL,0);
    }
    for (unsigned i=0;i<myString.size();i++)
    {
            cout<<myHexArray[i]<<endl;
    }

The function stoi() was introduced by C++11. In order to compile with gcc, you should compile with the flags -std=c++11.

In case you're using an older version of c++ you can use strtol instead of stoi. Note that you need to convert the string to a character array first.

myHexArray[i]=strtol(myString[i].c_str(),NULL,0);

ranban282
  • 148
  • 1
  • 14
  • What's this unsigned char* nonsense? What's wrong with a vector of bytes? – Richard Hodges Apr 18 '17 at 06:48
  • @ranban, im using codeblocks with mingw 4.9.2, the compiler is already set to use c++11. Im getting "Stoi" was not declared in the scope. using std::stoi gives the same error "stoi is not a member of std" – erik.martin Apr 18 '17 at 07:12
  • I read that stoi is not a member of the std namespace in the minGW, which codeblocks uses. How did you get this solution working? Did you use strtol? – ranban282 Apr 18 '17 at 08:19
  • well, apparently codeblocks has a bug. I searched online and someone suggested that i use the TDM-GCC-Mingw compiler..I downloaded and installed it from here: https://sourceforge.net/projects/tdm-gcc/ and then used this as a compiler for codeblocks. It works now :) – erik.martin Apr 18 '17 at 08:28
  • Great, please upvote the answer if you found it helpful. – ranban282 Apr 18 '17 at 08:43
  • I tried doing that but unfortunately i don't have enough reputation to do that at the moment! – erik.martin Apr 18 '17 at 08:55
1

You can use std::stoul on each of your values and build your array using another std::vector like this:

std::vector<std::string> vs {"0xFF", "0xD8", "0xFF" ...};

std::vector<unsigned char> vc;
vc.reserve(vs.size());

for(auto const& s: vs)
    vc.push_back((unsigned char) std::stoul(s, 0, 0));

Now you can access your array with:

vc.data(); // <-- pointer to unsigned char array
Galik
  • 47,303
  • 4
  • 80
  • 117
0

Here's a complete solution including a test and a rudimentary parser (for simplicity, it assumes that the xml tags are on their own lines).

#include <string>
#include <sstream>
#include <regex>
#include <iostream>
#include <iomanip>
#include <iterator>

const char test_data[] =
R"__(<jpeg1>
0xFF,0xD8,0xFF,0xE0,0x00,0x10,0x4A,0x46,0x49,0x46,0x00,0x01,0x01,0x01,0x00,0x60,
0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0
</jpeg1>)__";


struct Jpeg
{
    std::string name;
    std::vector<std::uint8_t> data;
};

std::ostream& operator<<(std::ostream& os, const Jpeg& j)
{
    os << j.name << " : ";
    const char* sep = " ";
    os << '[';
    for (auto b : j.data) {
        os << sep << std::hex << std::setfill('0') << std::setw(2) << std::uint32_t(b);
        sep = ", ";
    }
    return os << " ]";

}

template<class OutIter>
OutIter read_bytes(OutIter dest, std::istream& source)
{
    std::string buffer;
    while (std::getline(source, buffer, ','))
    {
        *dest++ = static_cast<std::uint8_t>(std::stoul(buffer, 0, 16));
    }
    return dest;
}

Jpeg read_jpeg(std::istream& is)
{
    auto result = Jpeg {};
    static const auto begin_tag = std::regex("<jpeg(.*)>");
    static const auto end_tag = std::regex("</jpeg(.*)>");
    std::string line, hex_buffer;
    if(not std::getline(is, line)) throw std::runtime_error("end of file");
    std::smatch match;
    if (not std::regex_match(line, match, begin_tag)) throw std::runtime_error("not a <jpeg_>");
    result.name = match[1];

    while (std::getline(is, line))
    {
        if (std::regex_match(line, match, end_tag)) { break; }
        std::istringstream hexes { line };
        read_bytes(std::back_inserter(result.data), hexes);
    }


    return result;
}

int main()
{
    std::istringstream input_stream(test_data);
    auto jpeg = read_jpeg(input_stream);

    std::cout << jpeg << std::endl;
}

expected output:

1 : [ ff, d8, ff, e0, 00, 10, 4a, 46, 49, 46, 00, 01, 01, 01, 00, 60, 12, 34, 56, 78, 9a, bc, de, f0 ]
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142