1

In C++ I can initialize a vector using

std::vector<uint8_t> data = {0x01, 0x02, 0x03};

For convenience (I have python byte strings that naturally output in a dump of hex), I would like to initialize for a non-delimited hex value of the form:

std::vector<uint8_t> data = 0x229597354972973aabbe7;

Is there a variant of this that is valid c++?

Steve
  • 3,957
  • 2
  • 26
  • 50
  • 1
    `0x229597354972973aabbe7` is likely to overflow. – Evg Jul 31 '20 at 19:22
  • 1
    simple answer: no. There is no constructor for `std::vector` that accepts a long hex value. You will have to write a parser yourself or look for a BigInt library that does it for you – JHBonarius Jul 31 '20 at 19:22
  • 4
    You might be able to create a [user defined literal](https://en.cppreference.com/w/cpp/language/user_literal) to convert the hex literal into the desired vector. – 1201ProgramAlarm Jul 31 '20 at 19:27
  • @1201ProgramAlarm This is interesting and code for `auto v = 0xaabb0305_hexvec;` was not hard to make with `std::vector operator"" _hexvec(const char*str)` – Steve Jul 31 '20 at 21:22
  • A function to convert (from your hex notation to a vector of uint8_t) in c++ is less than 50 lines. – 2785528 Aug 01 '20 at 22:16

2 Answers2

2

Combining comments from Evg, JHbonarius and 1201ProgramAlarm:

The answer is that there is no direct way to group but a long hex value into a vector, however, using user defined literals provides a clever notation improvement.

First, using RHS 0x229597354972973aabbe7 anywhere in the code will fail because because unsuffixed literals are assumed to be of type int and will fail to be contained in the register. In MSVC, result in E0023 "integer constant is too large". Limiting to smaller hex sequences or exploring large data types may be possible with suffixed notation, but this would ruin any desire for simplicity.

Manual conversion is necessary, but user defined literals may provide a slightly more elegant notation. For example, we can enable conversion of a hex sequence to a vector with

std::vector<uint8_t> val1 = 0x229597354972973aabbe7_hexvec;
std::vector<uint8_t> val2 = "229597354972973aabbe7"_hexvec;

using the following code:

#include <vector>
#include <iostream>
#include <string>
#include <algorithm>


// Quick Utlity function to view results:
std::ostream & operator << (std::ostream & os, std::vector<uint8_t> & v)
{
    for (const auto & t : v)
        os << std::hex << (int)t << " ";

    return os;
}

std::vector<uint8_t> convertHexToVec(const char * str, size_t len)
{
    // conversion takes strings of form "FFAA54" or "0x11234" or "0X000" and converts to a vector of bytes.

    // Get the first two characters and skip them if the string starts with 0x or 0X for hex specification:
    std::string start(str, 2);
    int offset = (start == "0x" || start == "0X") ? 2 : 0;

    // Round up the number of groupings to allow for ff_hexvec  fff_hexvec and remove the offset to properly count 0xfff_hexvec
    std::vector<uint8_t> result((len + 1 - offset) / 2);

    size_t ind = result.size() - 1;

    // Loop from right to left in in pairs of two but watch out for a lone character on the left without a pair because 0xfff_hexvec is valid:
    for (const char* it = str + len - 1; it >= str + offset; it -= 2) {
        int  val = (str + offset) > (it - 1); // check if taking 2 values will run off the start and use this value to reduce by 1 if we will
        std::string s(std::max(it - 1, str + offset), 2 - val);
        result[ind--] = (uint8_t)stol(s, nullptr, 16);
    }
        
    return result;
}

std::vector<uint8_t> operator"" _hexvec(const char*str, std::size_t len)
{
    // Handles the conversion form "0xFFAABB"_hexvec or "12441AA"_hexvec
    return convertHexToVec(str, len);
}

std::vector<uint8_t> operator"" _hexvec(const char*str)
{
    // Handles the form 0xFFaaBB_hexvec and 0Xf_hexvec
    size_t len = strlen(str);
    return convertHexToVec(str, len);   
}

int main()
{
    std::vector<uint8_t> v;

    std::vector<uint8_t> val1 = 0x229597354972973aabbe7_hexvec;
    std::vector<uint8_t> val2 = "229597354972973aabbe7"_hexvec;

    std::cout << val1 << "\n";
    std::cout << val2 << "\n";

    return 0;
}

The coder must decide whether this is superior to implementing and using a more traditional convertHexToVec("0x41243124FF") function.

Steve
  • 3,957
  • 2
  • 26
  • 50
  • 1
    `unsuffixed literals are assumed to be of type int` that's completely wrong. *The type of an integer constant is the first of the corresponding list in which its value can be represented* [Type of integer literals not int by default?](https://stackoverflow.com/a/8108715/995714) – phuclv Aug 04 '20 at 02:55
1

Is there a variant of this that is valid c++?

I think not.


The following code is valid C++, and uses a more "traditional hex conversion" process.

  • Confirm and remove the leading '0x', also confirm that all chars are hex characters.

  • modifyFor_SDFE() - 'space delimited format extraction'

This function inserts spaces around the two char byte descriptors.

Note that this function also adds a space char at front and back of the modified string. This new string is used to create and initialize a std::stringstream (ss1).

  • By inserting the spaces, the normal stream "formatted extraction" works cleanly

The code extracts each hex value, one by one, and pushes each into the vector, and ends when last byte is pushed (stream.eof()). Note the vector automatically grows as needed (no overflow will occur).

Note that the '0x' prefix is not needed .. because the stream mode is set to hex.

Note that the overflow concern (expressed above as "0x22...be7 is likely to overflow." has been simply side-stepped, by reading only a byte at a time. It might be convenient in future efforts to use much bigger hex strings.


#include <iostream>
using std::cout, std::cerr, std::endl, std::hex,
      std::dec, std::cin, std::flush; // c++17

#include <iomanip>
using std::setw, std::setfill;

#include <string>
using std::string;

#include <sstream>
using std::stringstream;

#include <vector>
using std::vector;
typedef vector<uint8_t>  UI8Vec_t;

#include <cstdint>
#include <cassert>


class F889_t // Functor ctor and dtor use compiler provided defaults
{
  bool    verbose;

public:
  int operator()(int argc, char* argv[])     // functor entry
    {
      verbose = ( (argc > 1) ? ('V' == toupper(argv[1][0])) : false );
      return exec(argc, argv);
    }
  // 2 lines

private:

  int exec(int , char** )
    {
      UI8Vec_t   resultVec;                            // output

      // example1 input
      // string data1 = "0x229597354972973aabbe7";     // 23 chars, hex string
      // to_ui8_vec(resultVec, data1);
      // cout << (verbose ? "" : "\n") << "  vector result       "
      //      << show(ui8Vec);  // show results

      // example2 input   46 chars (no size limit)
      string data = "0x330508465083084bBCcf87eBBaa379279543795922fF";

      to_ui8_vec (resultVec, data);

      cout << (verbose ? "  vector elements      " : "\n  ")
           << show(resultVec) << endl; // show results

      if(verbose) { cout << "\n  F889_t::exec()  (verbose)  ("
                         <<  __cplusplus  << ")" << endl; }

      return 0;
    } // int exec(int, char**)
  // 7 lines

  void to_ui8_vec(UI8Vec_t& retVal,         // output (pass by reference)
                  string    sData)          //  input (pass by value)
    {
      if(verbose) { cout << "\n  input data        '" << sData
         << "'                       (" << sData.size() << " chars)" << endl;}
      { // misc format checks:
        size_t szOrig = sData.size();
        {
          // confirm leading hex indicator exists
          assert(sData.substr(0,2) == string("0x"));
          sData.erase(0,2);                 // discard leading "0x"
        }
        size_t sz = sData.size();
        assert(sz == (szOrig - 2)); // paranoia
        // to test that this will detect any typos in data:
        //    temporarily append or insert an invalid char, i.e. sData += 'q';
        assert(sData.find_first_not_of("0123456789abcdefABCDEF") == std::string::npos);
      }

      modifyFor_SDFE (sData); // SDFE - 'Space Delimited Formatted Extraction'

      stringstream ss1(sData); // create / initialize stream with SDFE

      if(verbose) { cout << "  SDFE  data         '" << ss1.str() // echo init
                         << "' (" << sData.size() << " chars)" << endl; }

      extract_values_from_SDFE_push_back_into_vector(retVal, ss1);

    } // void to_ui8_vec (vector<uint8_t>&, string)
  // 13 lines

  // modify s (of any size) for 'Space Delimited Formatted Extraction'
  void modifyFor_SDFE (string& s)
    {
      size_t indx = s.size();
      while (indx > 2)
      {
        indx -= 2;
        s.insert (indx, 1, ' ');  // indx, count, delimiter
      }
      s.insert(0, 1, ' '); // delimiter at front of s
      s += ' ';            // delimiter at tail of s
    } // void modifyFor_SDFE (string&)
  // 6 lines

  void extract_values_from_SDFE_push_back_into_vector(UI8Vec_t&      retVal,
                                                      stringstream&  ss1)
    {
      do {
        uint  n = 0;

        ss1 >> hex >> n;  // use SDFE, hex mode - extract one field at a time

        if(!ss1.good())   // check ss1 state
        {
          if(ss1.eof()) break; // quietly exit, this is a normal stream exit
          // else make some noise before exit loop
          cerr << "\n  err: data input line invalid [" << ss1.str() << ']' << endl; break;
        }

        retVal.push_back(static_cast<uint8_t>(n & 0xff)); // append to vector

      } while(true);
    } // void extract_from_SDFE_push_back_to_vector(UI8Vec_t& , string)
  // 6 lines

  string show(const UI8Vec_t& ui8Vec)
    {
      stringstream ss ("\n  ");
      for (uint i = 0; i < ui8Vec.size(); ++i) {
        ss << setfill('0') << setw(2) << hex 
           << static_cast<int>(ui8Vec[i]) << ' '; }
      if(verbose) { ss << "  (" << dec << ui8Vec.size() << " elements)"; }
      return ss.str();
    }
  // 5 lines

}; // class F889_t

int main(int argc, char* argv[]) { return F889_t()(argc, argv); }

Typical outputs when invoked with 'verbose' second parameter

$ ./dumy889 verbose

  input data        '0x330508465083084bBCcf87eBBaa379279543795922fF'                       (46 chars)
  SDFE  data         ' 33 05 08 46 50 83 08 4b BC cf 87 eB Ba a3 79 27 95 43 79 59 22 fF ' (67 chars)
  vector elements      33 05 08 46 50 83 08 4b bc cf 87 eb ba a3 79 27 95 43 79 59 22 ff   (22 elements)

When invoked with no parameters

$ ./dumy889 

  33 05 08 46 50 83 08 4b bc cf 87 eb ba a3 79 27 95 43 79 59 22 ff 

The line counts do not include empty lines, nor lines that are only a comment or only a brace. You may count the lines as you wish.

2785528
  • 5,438
  • 2
  • 18
  • 20