0

There is no Standard function in C to take a string, break it up at whitespace or other delimiters, and create an array of pointers to char, in one step. If you want to do that sort of thing, you have to do it yourself, either completely by hand, or by calling e.g. strspn and strpbrk in a loop, or by calling strtok in a loop, or by calling strsep in a loop.

I am not asking how to do this. I know how to do this, and there are plenty of other questions on Stackoverflow about how to do it. What I'm asking is if there are any good reasons why there's no such function.

I know the two main reasons, of course: "Because no mainstream compiler/library ever had one" and "Because the C Standard didn't specify one, either (because it likes to standardize existing practice)." But are there any other reasons? (Are there arguments that such a function is an actively bad idea?)

This is usually a lame and pointless sort of question, I know. In this case I'm fixated on it because convenient splitting is such a massively useful operation. I wrote my own string splitter within my first year as a C programmer, I think, and it's been a huge productivity enhancer for me ever since. There are dozens of questions here on SO every day that could be answered easily (or that wouldn't even have to be asked) if there were a standard split function that everyone could use and refer to.

To be clear, the function I'm imagining would have a signature like

int split(char *string, char **argv, int maxargs, const char *delim)

It would break up string into at most maxargs substrings, splitting on one or more characters from delim, placing pointers to the substrings into argv, and modifying string in the process.

And to head off an argument I'm sure someone will make: although it's standard, I do not consider strtok to be an effective solution. strtok, frankly, sucks. Saying "you don't need a split function, because strtok exists" is a lot like saying "You don't need printf, because puts exists." This is not a question about what's theoretically possible with a given toolset; it's about what's useful and convenient. The more fundamental issue here, I guess, concerns the ineffable tradeoffs involved in picking tools that are leverageable and productivity-enhancing and that "pay their way". (I think it's clear that a nicely encapsulated string-splitting function would pay its way handsomely, but perhaps that's just me.)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • 2
    You could ask the same thing about why there's no `uppercase(char*)`. `toupper(char)` already exists and you can build your own `uppercase(char*)` with it. Same with `strtok` and your `split`. Obviously there are exceptions to this. But I feel that this question is pretty opinion-based. – Kevin Mar 19 '18 at 21:08
  • I find the question very interesting, but I highly doubt that it exists an answer on the form *"That's because in 1986, they took a formal decision to not include such a function"*, so I suspect that any answer will be highly speculative. Therefore I vote to close. Sorry. – klutt Mar 19 '18 at 21:15
  • @Kevin MSVC has `strupr` and `strlwr`. Any library is free to add utility functions. Another example is gcc's `strsep`. – Weather Vane Mar 19 '18 at 21:17
  • 3
    Adding in such functionality into a language that does not even have a string type is pushing it:) – Martin James Mar 19 '18 at 21:22
  • 1
    How about this analogy: "You don't need `scanf`, because `gets` exists." – chqrlie Mar 19 '18 at 21:56
  • What happens if the string contains more occurrences of the separator? This API does not seem easy to restart like `snprintf()`. – chqrlie Mar 19 '18 at 22:02

1 Answers1

1

I will try an answer. I indeed agree that such a function would be usefull. It is often quite usefull in the languages that have one.

Basically you are suggesting a builtin very simple wrapper around strtok() or strtok_r(). It would be a less powefull version (as we can't change delimiter while processing) but still usefull in some cases.

What I see is that these cases are also overlapping with scanf() familly functions use cases and with getopt() or getsubopt() familly functions use cases.

Actually I'm not sure that the remaining real use cases are that common.

In real life non trivial cases you would need a true parser or regex library, in specialized common case you already have scanf() or getopt() or even strtok().

Also functions modifying their input strings like strtok() or yours are more or less deprecated these days (experience says they easily lead to troubles).

Most languages providing a split feature have a real string type, often an unmutable one, and are supporting it by creating many individual substrings while leaving the original string intact.

Following that path would lead to either some other API non based on zero delimited strings (maybe with a start pointer and and end pointer), or with allocated string copies (like when using strdup()). Neither really satisfying.

In the end, if you add up not so common use in real life, quite simple to write and not that simple or intuitive API, there is no wonder that such function wasn't included in strandard libc.

Basically I would write something like that:

#include <string.h>
#include <stdio.h>

int split(char *string, char **argv, int maxargs, const char *delim){
    char * saveptr = 0;
    int x = 0;
    argv[x++] = strtok_r(string, delim, &saveptr);
    while(argv[x-1] && (x <= maxargs)){
        argv[x++] = strtok_r(0, delim, &saveptr);
    }
    return x-1;
}

int main(){
    char * args[10];
    {
        char * str = strdup("un deux trois quatre cinq six sept huit neuf dix onze");
        int res = split(str, args, sizeof(args)/sizeof(char*), " ");
        printf("res = %d\n", res);
        for(int x = 0; x < res ; x++){
            printf("%d:%s\n", x, args[x]);
        }
    }

    {
        char * str = strdup("un deux trois quatre cinq");
        int res = split(str, args, sizeof(args)/sizeof(char*), " ");
        printf("res = %d\n", res);
        for(int x = 0; x < res ; x++){
            printf("%d:%s\n", x, args[x]);
        }
    }
}

What I see looking at the code is that the wanted function is really very simple to write using strtok()... and that the call site to use the result is nearly as complicated than the function itself. In such a case hencefore I'd rather inline the function on the call site than having to call libc.

But of course you are welcome to use and write yours if you believe it's simpler for you.

kriss
  • 23,497
  • 17
  • 97
  • 116