2

I am trying to convert a comma separated string to vector of const char*. With the following code, by expected output is

ABC_
DEF
HIJ

but I get

HIJ
DEF
HIJ

Where am I going wrong?

Code:

#include <iostream>
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
using namespace std;

int main()
{
   string s("ABC_,DEF,HIJ");
   typedef boost::char_separator<char> char_separator;
   typedef boost::tokenizer<char_separator> tokenizer;

   char_separator comma(",");
   tokenizer token(s, comma);
   tokenizer::iterator it;

   vector<const char*> cStrings;

   for(it = token.begin(); it != token.end(); it++)
   {
      //cout << (*it).c_str() << endl;
      cStrings.push_back((*it).c_str());
   }

   std::vector<const char*>::iterator iv;
   for(iv = cStrings.begin(); iv != cStrings.end(); iv++)
   {
      cout << *iv << endl;
   }
   return 0;
}

http://ideone.com/3tvnUs

EDIT: Solution with help of answers below: (PaulMcKenzie offers a neater solution using lists.)

#include <iostream>
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
using namespace std;

char* createCopy(std::string s, std::size_t bufferSize)
{
   char* value = new char[bufferSize];
   memcpy(value, s.c_str(), (bufferSize - 1));
   value[bufferSize - 1] = 0;
   return value;
}

int main()
{
   string s("ABC_,DEF,HIJ");
   typedef boost::char_separator<char> char_separator;
   typedef boost::tokenizer<char_separator> tokenizer;

   char_separator comma(",");
   tokenizer token(s, comma);
   tokenizer::iterator it;

   vector<const char*> cStrings;

   for(it = token.begin(); it != token.end(); it++)
   {
      //cout << it->c_str() << endl;
      cStrings.push_back(createCopy(it->c_str(),
                                      (it->length() + 1)));
   }

   std::vector<const char*>::iterator iv;
   for(iv = cStrings.begin(); iv != cStrings.end(); iv++)
   {
      cout << *iv << endl;
   }

   //delete allocations by new
   //...
   return 0;
}
armundle
  • 1,149
  • 2
  • 15
  • 28
  • Why not an array of `std::string`? – jxh May 11 '15 at 22:05
  • I need to later pass this on to a library which can only accept a `const char**`; I pass this vector as `&vector[0]`. – armundle May 11 '15 at 22:06
  • be aware that your signature of createCopy requires creating a copy of string twice, as it takes a copy of string. You probably want this function to take `const char*` as an argument. – Tomasz Lewowski May 12 '15 at 19:28

2 Answers2

5

Here's the thing: boost::tokenizer::iterator doesn't return you ownership of a copy of the string, but a refernce to an internal copy.

For example, after running your code I get:

HIJ
HIJ
HIJ

the solution is to replace cStrings.push_back((*it).c_str()) with one of the following:

    char* c = new char[it->length() + 1];
    c[it->length()] = 0;
    cStrings.push_back(c);
    std::strncpy(c, it->c_str(), it->length());

doesn't look pretty, but you probably won't get faster than that (at least if you want to use boost::tokenizer.

other option is to totally replace boost::tokenizer with e.g. strtok - an example can be found here: C split a char array into different variables

you can also use boost::algorithm::string::split, but then you might need to remap string to const char* later on.

Community
  • 1
  • 1
Tomasz Lewowski
  • 1,935
  • 21
  • 29
1

Here is a method that doesn't require dynamic allocation, and at the same time give you the std::vector that you're looking for. The trick is to create your arguments in "permanent" storage, and then just set the vector of pointers to this storage.

The code below uses std::list for the permanent storage. The reason why is that we can guarantee that the iterators to the items in the std::list do not become invalidated when we add items to the list container. This is a necessary requirement when building the final vector of const char *.

#include <iostream>
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
#include <list>

typedef std::vector<char> CharArray;
typedef std::list<CharArray> StringList;

using namespace std;

int main()
{
   StringList sList;

   string s("ABC_,DEF,HIJ");
   typedef boost::char_separator<char> char_separator;
   typedef boost::tokenizer<char_separator> tokenizer;

   char_separator comma(",");
   tokenizer token(s, comma);
   tokenizer::iterator it;

   vector<const char*> cStrings;

   for(it = token.begin(); it != token.end(); ++it)
   {
        // create an array of char and place on list
        sList.push_back(CharArray(it->begin(), it->end()));

        // null terminate this entry
        sList.back().push_back(0);

        // add the pointer to this entry to the vector of const char *.
        cStrings.push_back(&sList.back()[0]);
   }

   std::vector<const char*>::iterator iv;
   for(iv = cStrings.begin(); iv != cStrings.end(); iv++)
   {
      cout << *iv << endl;
   }
}

Note that we don't have to dynamically allocate memory here. The only thing you need to ensure is that the StringList doesn't go out of scope, since this is where your arguments are found.

PaulMcKenzie
  • 34,698
  • 4
  • 24
  • 45
  • CharArray is likely to dynamically allocate memory, although you're right that small vector optimization may go in handy here. – Tomasz Lewowski May 12 '15 at 19:30
  • Yes, CharArray internally handles the memory, but that is the difference between doing that, and having to manage the memory yourself. The code above is guaranteed not to leak. – PaulMcKenzie May 12 '15 at 20:25
  • that's true, but at the expense of bookkeeping two lists. It's in fact equal to copying to `std::vector` and then taking `c_str`. It doesn't need to be a problem of course. – Tomasz Lewowski May 13 '15 at 19:45