9

Possible Duplicate:
How do I tokenize a string in C++?

Hello I was wondering how I would tokenize a std string with strtok

string line = "hello, world, bye";    
char * pch = strtok(line.c_str(),",");

I get the following error

error: invalid conversion from ‘const char*’ to ‘char*’
error: initializing argument 1 of ‘char* strtok(char*, const char*)’

I'm looking for a quick and easy approach to this as I don't think it requires much time

Community
  • 1
  • 1
Daniel Del Core
  • 3,071
  • 13
  • 38
  • 52

4 Answers4

18

I always use getline for such tasks.

istringstream is(line);
string part;
while (getline(is, part, ','))
  cout << part << endl;
PiotrNycz
  • 23,099
  • 7
  • 66
  • 112
11
std::string::size_type pos = line.find_first_of(',');
std::string token = line.substr(0, pos);

to find the next token, repeat find_first_of but start at pos + 1.

Mousa
  • 2,190
  • 3
  • 21
  • 34
Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • With this, there would have to be another variable to keep track of pos1 and pos2. Otherwise, you would be using substring from 0 to what the new pos is, instead of pos1 to pos2. – anthony Mar 07 '21 at 22:11
4

You can use strtok by doing &*line.begin() to get a non-const pointer to the char buffer. I usually prefer to use boost::algorithm::split though in C++.

spencercw
  • 3,320
  • 15
  • 20
  • I think by discarding the const on the internal pointer of the string, you allow strtok to modify the string's internal pointer - very dirty. – Matt Sep 27 '12 at 17:55
  • 1
    This is a terrible idea. It will put the `std::string` into an undefined state. You are not supposed to modify a `std::string` using C string functions. – japreiss Sep 27 '12 at 17:55
  • 1
    @japreiss How can it possibly go wrong? There's nothing wrong with modifying the characters in a string through its iterators, and C++ strings are [always contiguous in practice](http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530), and are guaranteed to be contiguous and null-terminated in C++11. – spencercw Sep 27 '12 at 18:00
  • @spencercw: There is no guarantee that the string's internal representation is zero-terminated; and it might use copy-on-write semantics, in which case subverting `const` could change other copies of the string. It might (or might not) be possible to demonstrate that what you're doing is well-defined for any conformant implementation, but even if you can, I wouldn't like to test the edge cases of conformance like that. – Mike Seymour Sep 27 '12 at 18:04
  • @MikeSeymour The internal buffer is guaranteed to be null-terminated in C++11 ([see this answer](http://stackoverflow.com/a/7554172/766580)). You raise an interesting point with copy-on-write though. I would guess that in such an implementation dereferencing the iterator would trigger the copy, or some sort of memory guard would trigger the copy anyway when `strtok` writes to the buffer. Are there any implementations that actually do CoW? – spencercw Sep 27 '12 at 18:11
  • @spencercw: No, C++11 doesn't guarantee that the buffer is zero-terminated; just that the characters of the string are stored contiguously, `s[s.size()] == 0`, and `s.data()` and `s.c_str()` return `const` pointers to zero-terminated arrays. In practice that means that any sane implementation will use a zero-terminated contiguous buffer, but it's not guaranteed. GCC uses CoW; they've probably got all the awkward details of access via pointer-to-dereferenced-iterator right, but personally I'd rather not rely on that. – Mike Seymour Sep 27 '12 at 22:55
1

strtok is a rather quirky, evil function that modifies its argument. This means that you can't use it directly on the contents of a std::string, since there's no way to get a pointer to a mutable, zero-terminated character array from that class.

You could work on a copy of the string's data:

std::vector<char> buffer(line.c_str(), line.c_str()+line.size()+1);
char * pch = strtok(&buffer[0], ",");

or, for more of a C++ idiom, you could use a string-stream:

std::stringstream ss(line);
std::string token;
std::readline(ss, token, ',');

or find the comma more directly:

std::string token(line, 0, line.find(','));
Mike Seymour
  • 249,747
  • 28
  • 448
  • 644