0

I have code that reads a .txt file:

Pancho: Juanka,Cony

The idea is to find the character ':' and save it apart in an array of strings. The same with Juanka and Cony but instead, it's going to be when it finds ',' and '\0'.

Spikatrix
  • 20,225
  • 7
  • 37
  • 83

1 Answers1

0

While strtok will solve this problem, I believe it won't be a very maintainable solution. I intend to point out some problems with using strtok, and solutions to those problems.


Problem #1: Due to hidden state, strtok is non-reentrant and non-threadsafe; if you try to tokenise two strings simultaneously (e.g. interweaving sequences of function calls, or using multiple threads), you'll run into issues.

Solution #1: strchr and strcspn can be used instead to address this issue. I've demonstrated the ability to read lines in other answers1,2 using strcspn; these could be easily adapted to use strchr instead, or to use characters other than '\n'.


Problem #2: strtok, strchr and strcspn all operate upon a string which needs an intermediate array to exist within. You're reading from a file; if you don't need that intermediate array because you can read the fields directly into their corresponding arrays, then eliminating them might expose more advanced optimisations and cleaner, more maintainable code.

Solution #2: The following code demonstrates performing the splitting directly from the file by using fscanf.

#include <stdio.h>

#define WIDTH_STR(width) #width
#define FIXED_FIELD(width) "%" WIDTH_STR(width)
#define TERMINAL(set) "[^" set "]%*1[" set "] "

#define W 1024
int parse(FILE *f) {
    char x[W+1], y[W+1], z[W+1];
    if (fscanf(f, FIXED_FIELD(W) TERMINAL(":"),  x) <= 0) { return EOF; }
    if (fscanf(f, FIXED_FIELD(W) TERMINAL(","),  y) <= 0) { return EOF; }
    if (fscanf(f, FIXED_FIELD(W) TERMINAL("\n"), z) <= 0) { return EOF; }
    printf("<%s>\n", x);
    printf("<%s>\n", y);
    printf("<%s>\n", z);
    return 0;
}

int main(void) {
    printf("parse returned: %d\n", parse(stdin));
}

Problem #3: All of the above solutions reach peak optimality (in terms of maintainability/complexity and computational efficiency) when you assume fields are fixed width. Once that assumption becomes invalid, it makes much more sense to use fgetc to read and parse one byte at a time, reallocating as necessary to accomodate for the variadic fields.

Solution #3: I've demonstrated the ability to read words of variable length in another answer, which would be easily adapted to read and parse single tokens into separate dynamic allocations. This is likely suffer the drawback of expensive reallocation necessary to allow your users to enter enormous (multiple-megabyte) field values which wouldn't typically be supported as fixed width arrays with automatic storage duration.

autistic
  • 1
  • 3
  • 35
  • 80