1

I've got a function that splits up a string into various sections and then parses them, but when converting a string to char* I get a malformed output.

int parseJob(char * buffer)
{ // Parse raw data, should return individual jobs
    const char* p;
    int rows = 0;
    for (p = strtok( buffer, "~" );  p;  p = strtok( NULL, "~" )) { 
        string jobR(p);
        char* job = &jobR[0];
        parseJobParameters(job); // At this point, the data is still in good condition
    }
    return (1);
}

int parseJobParameters(char * buffer)
{ // Parse raw data, should return individual job parameters
    const char* p;
    int rows = 0;
    for (p = strtok( buffer, "|" );  p;  p = strtok( NULL, "|" )) { cout<<p; } // At this point, the data is malformed.
    return (1);
}

I don't know what happens between the first function calling the second one, but it malforms the data.

As you can see from the code example given, the same method to convert string to char* is used and it works fine.

I'm using Visual Studio 2012/C++, any guidance and code examples will be greatly appreciated.

Ryan
  • 957
  • 5
  • 16
  • 34

3 Answers3

2

You cannot use strtok() to parse multiple strings at the same time, like you are doing. The first call to parseJobParameters() in the first loop iteration of parseJob() will alter the internal buffer that strtok() points to, thus the second loop iteration of parseJob() will not be processing the original data anymore. You need to rewrite your code to not use nested calls to strtok() anymore, eg:

#include <vector>
#include <string>

void split(std::string s, const char *delims, std::vector &vec)
{
    // alternatively, use s.find_first_of() and s.substr() instead...
    for (const char* p = strtok(s.c_str(), delims); p != NULL; p = strtok(NULL, delims))
    {
         vec.push_back(p);
    }
}

int parseJob(char * buffer)
{
    std::vector<std::string> jobs;
    split(buffer, "~", jobs);
    for (std::vector<std::string>::iterator i = jobs.begin(); i != jobs.end(); ++i)
    {
        parseJobParameters(i->c_str());
    }
    return (1);
}

int parseJobParameters(char * buffer)
{
    std::vector<std::string> params;
    split(buffer, "|", params);
    for (std::vector<std::string>::iterator i = params.begin(); i != params.end(); ++i)
    {
        std::cout << *i;
    }
    return (1);
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
2

The "physical" reason your code does not work has nothing to do with std::string or C++. It wouldn't work in pure C as well. strtok is a function that stores its intermediate parsing state in some global variable. This immediately means that you cannot use strtok to parse more than one string at a time. Starting the second parse session before finishing the first would override the internal data stored by the first parse session, thus ruining it beyond repair. In other words, strtok parse sessions must not overlap. In your code they do overlap.


Also, in C++03 the idea of using std::string with strtok directly is doomed from the start. The internal sequence stored in std::string is not guaranteed to be null-terminated. This means that generally &jobR[0] is not a C-string. It can't be used with strtok. To convert a std::string to a C-string you have to use c_str(). But C-string returned by c_str() is non-modifiable.

In C++11 the null-termination is supposed to be visible through the [] operator, but still there seems to be no requirement to store the terminator object contiguously with the actual string, so &jobR[0] is still not a C-string even in C++11. C-string returned by c_str() or data() is non-modifiable.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Thanks for clearing that up for me. Can you give any suggestions on how to stop the two parsing sessions from overlapping? – Ryan Dec 26 '12 at 00:22
  • @user1661022: Either redesign your code (so that they don't overlap) or keep the design as is, but stop using `strtok`. Write your own reenterable parser, or use an existing library solution from Boost or some other library. – AnT stands with Russia Dec 26 '12 at 00:40
  • This is, in my opinion, a very strong answer to a question asked in other places on SO. I would support this being linked to as the canonical answer in instances of redundant questions. – Max von Hippel May 30 '16 at 16:39
0

Whilst this will give you the address of the first character in the string char* job = &jobR[0];, it does not give you a valid C-style string. YOu SHOULD use char* job = jobR.c_str();

I'm fairly sure that will solve your problem, but there could of course be something wrong with the way you read the buffer that is passed to parseJob in as well.

Edit: of course, you are also calling strtok from a function that uses strtok. Inside strtok looks a bit like this:

char *strtok(char *str, char *separators)
{
     static char *last;
     char *found = NULL; 

     if (!str) str = last;
     ... do searching for needle, set found to beginning of non-separators ... 
     if (found) 
     {
          *str = 0; // mark end of string. 
     }

     last = str;
     return found;
}

Since "last" gets overwritten when you call parseParameters, you can't use strtok(NULL, ... ) when you get back to parseJobs

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • I have already attempted to do that, but IntelliSense is reporting that "a value of type "const char *" cannot be used to initialize an entity of type "char *" – Ryan Dec 25 '12 at 20:16
  • Ah, yes, of course, you are then ALSO messing about with the string that is inside the std::string - bad idea. You need to copy it into a new place. It would probably be easier to write the code in C++ using std::string all the way through! – Mats Petersson Dec 25 '12 at 20:18