0

I'm reading in a .csv file (delimited by commas) so I can analyze the data. Many of the fields are null, meaning a line might look like:

456,Delaware,14450,,,John,Smith

(where we don't have a phone number or email address for John Smith so these fields are null).

But when I try to separate these lines into tokens (so I can put them in a matrix to analyze the data), strtok doesn't return NULL or an empty string, instead it skips these fields and I wind up with mismatched columns.

In other words, where my desired result is:

a[0]=456
a[1]=Delaware
a[2]=14450
a[3]=NULL (or "", either is fine with me)
a[4]=NULL (or "")
a[5]=John
a[6]=Smith

Instead, the result I get is:

a[0]=456
a[1]=Delaware
a[2]=14450
a[3]=John
a[4]=Smith

Which is wrong. Any suggestions about how I can get the results I need will be greatly welcomed. Here is my code:

FILE* stream = fopen("filename.csv", "r");
i=0;
char* tmp;
char* field;
char line[1024];

while (fgets(line, 1024, stream))
{
    j=0;
    tmp = strdup(line);
    field= strtok(tmp, ",");

    while(field != NULL)
    {
       a[i][j] =field;

       field = strtok(NULL, ",");

       j++;
    }

    i++;
}
 fclose(stream);
WhozCraig
  • 65,258
  • 11
  • 75
  • 141
Hopper06
  • 89
  • 2
  • 10
  • 1
    If a function behaves different from what you expect, how about [reading some documentation](http://man7.org/linux/man-pages/man3/strtok.3.html)? _"[…] From the above description, it follows that a sequence of two or more contiguous delimiter bytes in the parsed string is considered to be a single delimiter, and that delimiter bytes at the start or end of the string are ignored. Put another way: the tokens returned by `strtok()` are always nonempty strings."_ Do you really need someone copy-and-pasting this here? – mafso Aug 05 '14 at 14:59
  • 4
    If you read the manual for `strtok` you'll notice that it treats multiple consecutive delimiters as a single delimiter. Therefore you need another function. `strsep` is more applicable. – Adam Aug 05 '14 at 15:00
  • See http://stackoverflow.com/questions/8705844/need-to-know-when-no-data-appears-between-two-token-separators-using-strtok for a possible solution. – uesp Aug 05 '14 at 15:01
  • Thank you for helping me with a suggestion for another function I could try, Adam. Thank you uesp for guiding me to another similar question. Your responses were the only ones that were helpful. – Hopper06 Aug 05 '14 at 15:06

1 Answers1

0

Quote from ISO/IEC 9899:TC3 7.21.5.8 The strtok function

3 The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

And the relevant quote for you:

4 The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.

So you cant catch multiple delimiter with strtok, as it isn't made for this. It just will skip them.

dhein
  • 6,431
  • 4
  • 42
  • 74
  • What is the down vote for? Am I saying wrong stuff by quoting the primary source? or is the primary source maybe inaccurate?... – dhein Aug 05 '14 at 15:03
  • You're not quoting the applicable parts of the doc. Your quotes are for how `strtok` uses delimiters with more than one character in them. The asker's delimiter is a single character. mafso quoted the appropriate paragraph in his comment. – Adam Aug 05 '14 at 15:06
  • @Adam. These _are_ the relevant quotes from the standard, I quoted a man page, because it is a little clearer about the implications of the text. But this answer isn't wrong IMO. – mafso Aug 05 '14 at 15:13
  • @Adam no, I quoted the part, where is said that there must be a character within the sequence and otherwise, strtok can't be used. So I explained him, WHY he can't use strtok for that, what he is intended to do. So this is pretty much on topic. – dhein Aug 06 '14 at 05:41
  • You're drawing attention to the unhelpful parts. You're highlighting "that is contained [in the current separator]" (i.e. delimiter) and "If no such character is found". But that's the opposite of what happens in an empty field: a separator is immediately found. It's possible to figure out `strtok`'s behavior in that case in what you quoted, but it requires ignoring your highlights and stepping through both paragraphs manually. The text from the man page is clear and on point. More importantly, the answer doesn't point the asker to a solution to their problem. – Adam Aug 06 '14 at 08:20