2

I'm using boost::tokenizer to tokenize a string in C++, then I want to pass it to execv.

Consider the following code snippet (compilable):

#include <iostream>
#include <cstdlib>
#include <vector>
#include <boost/tokenizer.hpp>

// I will put every token into this vector
std::vector<const char*> argc;
// this is the command I want to parse
std::string command = "/bin/ls -la -R";


void test_tokenizer() {
  // tokenizer is needed because arguments can be in quotes
  boost::tokenizer<boost::escaped_list_separator<char> > scriptArguments(
              command,
              boost::escaped_list_separator<char>("\\", " ", "\""));
  boost::tokenizer<boost::escaped_list_separator<char> >::iterator argument;
  for(argument = scriptArguments.begin(); 
    argument!=scriptArguments.end(); 
    ++argument) {

    argc.push_back(argument->c_str());
    std::cout << argument->c_str() << std::endl;
  }

  argc.push_back(NULL);
}

void test_raw() {
  argc.push_back("/bin/ls");
  argc.push_back("-l");
  argc.push_back("-R");

  argc.push_back(NULL);
}

int main() {
  // this works OK
  /*test_raw();
  execv(argc[0], (char* const*)&argc[0]);
  std::cerr << "execv failed";
  _exit(1);
  */

  // this is not working
  test_tokenizer();
  execv(argc[0], (char* const*)&argc[0]);
  std::cerr << "execv failed";
  _exit(2);
}

When I run this script it calls test_tokenizer(), it will print 'execv failed'. (Although it prints the arguments nicely).

However if I change test_tokenizer to test_raw it runs fine.

It must be some easy solution but I didn't find it.

PS.: I also drop this into an online compiler with boost support here.

Daniel
  • 2,318
  • 2
  • 22
  • 53
  • Nitpicks: The use of `argc` is confusing, as it's commonly used for the second argument to `main()`. The use of `_exit()` is non-portable; prefer the (standard and functionally identical) `exit()`. The use of `execv()` should `#include `. The construct `(char* const*)&argc[0]` is wrong (well, questionable at least) on too many levels to explain in one comment. And yes, avoid C strings (`char *`) in C++ programs, they just give you headaches. ;-) – DevSolar Sep 17 '19 at 21:34

1 Answers1

3

boost::tokenizer saves the token by value (and by default as std::string) in the token iterator.

Therefore the character array that argument->c_str() points to may be modified or invalidated when the iterator is modified and its lifetime will end with that of argument at the latest.

Consequently your program has undefined behavior when you try to use argc.

If you want to keep using boost::tokenizer, I would suggest to keep the tokens in a std::vector<std::string> and transform them to a pointer array afterwards.

walnut
  • 21,629
  • 4
  • 23
  • 59
  • Thanks. I would happily drop the tokenizer if there is something more suitable out there. – Daniel Sep 18 '19 at 05:28
  • Anyway storing them in a vector is nice, but then I'll have to construct an array of 'const char*' for execv. Here is the related stackoverflow topic: https://stackoverflow.com/questions/5797837/how-to-pass-a-vector-of-strings-to-execv, so I would prefer something more 'recent' or modern. – Daniel Sep 18 '19 at 07:17
  • 1
    @Daniel: The issue here is the transition from what is "proper" and convenient in C++, and what is "proper" for `execv()` (which isn't C but C++). What's the best way to go forward would depend on the *real* use-case... – DevSolar Sep 18 '19 at 10:58
  • @DevSolar: yes. you are absolutely right. Let's say I have a string variable *command*. What I want is to execute this *command*. It may contain arguments, even within quotes. Language is C++, OS Linux. – Daniel Sep 18 '19 at 19:25
  • 2
    @Daniel If it is basically an arbitrary shell command that you want to execute, you could simply `execl` a shell with `-c` option and pass the whole `command` as single argument. That way you are delegating the issue to the shell. (Prefix `exec` to the `command` if you don't want to create a child process in the shell.) Do not do that if the `command` comes from a untrusted source that shouldn't be able to execute arbitrary code. – walnut Sep 18 '19 at 20:21
  • As an alternative where a `std::vector` is required, you can use `boost::split` instead of `tokenizer`. – Riot Feb 05 '23 at 17:10