1

I want split_str to be able to take, for example, "bob is great" and return ["bob", "is", "great"].

More precisely: foo = split_str("bob is great", " ") allocates["bob", "is", "great"] in foo (thus becoming an array of 3 strings which were all separated by a space, as specified... but I would like this to be generalized to not only generating arrays of 3 strings, but of any amount of strings if possible).

char* split_str(char*, char[]);

char* split_str(char* str, char delim[]) {
    char copied_input[strlen(str)];
    strncpy (copied_input, str, strlen(str)+1);

    char* result[strlen(str)+1];  // add 1 for the "NULL" char

    int tmp = 0;  // preparing iterator
    result[tmp] = strtok (copied_input, delim);  // obtaining first word

    while (result[tmp] != NULL) {  // to populate the whole array with each words separately
        result[++tmp] = strtok (NULL, delim);
    }

    return result;
}

This represents more or less the kind of execution I'm trying to achieve:

int main (void)
{
    int MAX_AMNT = 50;  // maximum amount of args to parse
    char *bar[MAX_AMNT];
    bar = split_str("bob is great", " ");
    tmp = 0;
    while (bar[tmp] != NULL) {
        fprintf (stdout, "Repeating, from array index %d: %s\n", tmp, bar[tmp++]);
    }
}

I'm very new to C so I might be wrong in the way I've phrased my question (pointers and arrays, and pointers of arrays, and etc. is a bit of a headache still for me).

I know my return signature is wrong for my function, and also that it's probably wrong to return a local variable (result), but I'm lost as of how to proceed from here. I tried changing it to a void function and adding a third argument as a variable that would be populated (as result is), but I keep getting errors.

payne
  • 4,691
  • 8
  • 37
  • 85
  • 1
    `result` is local variable and it will be vanished once control exits function. – kiran Biradar Jan 18 '19 at 20:52
  • You are going to want to learn to use `malloc` and `free` – Christian Gibbons Jan 18 '19 at 20:52
  • @kiranBiradar that is part of my concern, as explained in the last paragraph of my question. – payne Jan 18 '19 at 20:52
  • 1
    `char **result = malloc((strlen(str)+1) * sizeof(char *));` – Barmar Jan 18 '19 at 20:55
  • @ChristianGibbons I come from a Java/Python background and I'm having trouble understanding how I'm supposed to manipulate arrays of undetermined sizes. I also have trouble understanding the difference between `char *bar[] = malloc(50);` and `char *bar[50];`. – payne Jan 18 '19 at 20:55
  • @Barmar Didn't know about `char **`. I've modified the `result` declaration line with your suggestion, and changed the returned type signature of the function. However, it seems like `char** bar = split_str (input_str, delim);` makes it so that `bar[0]` isn't an actual String (`fprintf(stdout, "%s", bar[0]);` doesn't show anything and `strcmp(bar[0], "first_word_of_input") == 0` returns `False`). – payne Jan 18 '19 at 21:11

2 Answers2

4

A solution is :

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

char ** split(const char * str, const char * delim)
{
  /* count words */
  char * s = strdup(str);

  if (strtok(s, delim) == 0)
    /* no word */
    return NULL;

  int nw = 1;

  while (strtok(NULL, delim) != 0)
    nw += 1;

  strcpy(s, str); /* restore initial string modified by strtok */

  /* split */
  char ** v = malloc((nw + 1) * sizeof(char *));
  int i;

  v[0] = strdup(strtok(s, delim));

  for (i = 1; i != nw; ++i)
    v[i] = strdup(strtok(NULL, delim));

  v[i] = NULL; /* end mark */

  free(s);

  return v;
}

int main()
{
  char ** v = split("bob is  great", " ");

  for (int i = 0; v[i] != NULL; ++i) {
    puts(v[i]);
    free(v[i]);
  }

  free(v);
  return 0;
}

As you see I add a null pointer at the end of the vector as a mark, but it can be changed easily to return the number of words etc

Execution :

bob
is
great

A second solution taking into account the remarks of alk :

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

char ** split(const char * str, const char * delim)
{
  /* count words */
  char * s = strdup(str);

  if ((s == NULL) /* out of memory */
      || (strtok(s, delim) == 0)) /* no word */
    return NULL;

  size_t nw = 1;

  while (strtok(NULL, delim) != 0)
    nw += 1;

  strcpy(s, str); /* restore initial string modified by strtok */

  /* split */
  char ** v = malloc((nw + 1) * sizeof(char *));

  if (v == NULL)
    /* out of memory */
    return NULL;

  if ((v[0] = strdup(strtok(s, delim))) == 0) {
    /* out of memory */
    free(v);
    return NULL;
  }

  size_t i;

  for (i = 1; i != nw; ++i) {
    if ((v[i] = strdup(strtok(NULL, delim))) == NULL) {
      /* out of memory, free previous allocs */
      while (i-- != 0)
        free(v[i]);
      free(v);
      return NULL;
    }
  }

  v[i] = NULL; /* end mark */

  free(s);

  return v;
}

int main()
{
  const char * s = "bob is still great";
  char ** v = split(s, " ");

  if (v == NULL)
    puts("no words of not enough memory");
  else {
    for (int i = 0; v[i] != NULL; ++i) {
      puts(v[i]);
      free(v[i]);
    }

    free(v);
  }
  return 0;
}

When out of memory the return value is NULL ( in a previous version it was the string to split), of course there are other ways to signal that easily


Execution under valgrind :

==5078== Memcheck, a memory error detector
==5078== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5078== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5078== Command: ./a.out
==5078== 
bob
is
still
great
==5078== 
==5078== HEAP SUMMARY:
==5078==     in use at exit: 0 bytes in 0 blocks
==5078==   total heap usage: 7 allocs, 7 frees, 1,082 bytes allocated
==5078== 
==5078== All heap blocks were freed -- no leaks are possible
==5078== 
==5078== For counts of detected and suppressed errors, rerun with: -v
==5078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 3)
bruno
  • 32,421
  • 7
  • 25
  • 37
  • Nice approach avoiding repetitive reallocation. – alk Jan 19 '19 at 15:00
  • Nitpick: All those counters should be `size_t` not `int`. Also error checking is missing completely. – alk Jan 19 '19 at 15:01
  • @alk I added an other solution, but to be frank for me to verify these is enough memory in that kind of application is mainly a loss of time, the day you have not enough memory be sure you will have other problems ^^ – bruno Jan 19 '19 at 15:29
  • "*`return (char **) str; ... if (((char *) v) == s)`*" dirty, dirty! ;) – alk Jan 19 '19 at 15:35
  • @alk a pointer is a pointer, while the goal is to check the equality that works. **But** `return &str;` while _str_ is a parameter ? Are you serious ? – bruno Jan 19 '19 at 15:37
  • You also could return `(char**) -1`. Not cleaner, but more explicit, in my opinion. There are also POSIX functions doing this. – alk Jan 19 '19 at 15:39
  • "*while str is a parameter ? Are you serious ?*" No, suffered a temporary brain lapse. Comment adjusted immediately. – alk Jan 19 '19 at 15:39
  • @alk if the result is not check to consider -1 or the char * as a char*[] will have the same consequence no ? I just wanted to separate no word and no memory without adding an out parameter – bruno Jan 19 '19 at 15:40
  • ok, to not take the risk be killed by other guys I edit to return NULL also if no memory, that's all ! – bruno Jan 19 '19 at 15:43
  • I completely understand your intention. Just wanted to say using minus `-1` makes it clear that we are not interested in what's behind the pointer, but just in a specific value indicating OoM. – alk Jan 19 '19 at 15:43
  • See here [man 2 shmat](http://man7.org/linux/man-pages/man2/shmat.2.html). It returns `(void*)-1` on error ... :-) – alk Jan 19 '19 at 15:46
  • 1
    @alk except is the C norm explicitly says an address is never equals (whatever the conversion signed/unsigned) to -1 it is wrong to use -1 as an invalid address – bruno Jan 19 '19 at 15:50
  • This I was wondering as well some time ago: https://stackoverflow.com/q/13306914/694576 – alk Jan 19 '19 at 15:52
  • No matter finally, we're probably talking to the walls and this answer is once again wasted, the OP did not react – bruno Jan 19 '19 at 16:01
  • A fine answer, showing nîce clean code is never wasted. :) – alk Jan 19 '19 at 16:10
  • @bruno We're celebrating my birthday today, and after passing -*multiple*- hours on that single piece of code yesterday while trying to learn C to answer part of an assignment, I must say I did not feel like reacting in some non-thoughtful way. I will spend my Sunday analyzing the answers to this thread and see how it goes. Sorry for the delay! I've been somewhat infuriated at how non-trivial and hard this function has proven to be while it is a simple `.split` in Java. =) – payne Jan 19 '19 at 21:15
  • @payne sorry for my bad mood, it's not personal but a certain weariness. Happy birthday. I hope my answer will be useful – bruno Jan 19 '19 at 21:35
  • @bruno Why did you need to allocated memory with `strdup()` at the beginning instead of simply using `strcpy()` ? It forced you to `free()` before the end of the function. I'm also wondering why you have used `char ** v = malloc((nw + 1) * sizeof(char *));` instead of doing like some other user that suggested `char **result = malloc((strlen(str)+1) * sizeof(char *));`. Counting the number of words is to be able to reduce the amount of allocated memory? Or maybe to use a `for` loop instead of a `while` ? – payne Jan 20 '19 at 18:11
  • @bruno also, more precisely, about the `char ** v = malloc((nw + 1) * sizeof(char *));` line: how does it work exactly? My understanding is that you are declaring a pointer to a pointer of char, and that the amount of memory allocated is somehow to be magically deducted from `sizeof(char *)` which is basically the amount of bits taken in memory for a pointer to a char? I'm having trouble understanding where the bridge is made between allocating those pointers and being able to use arrays directly (such as `v[0]`). Wouldn't `v` be a pointer to pointers of `char[]` rather than `char` ? – payne Jan 20 '19 at 18:14
  • 1
    @payne I work on a copy to not modify the original, strtok modify its first parameter, and I use it 2 times. The `malloc((strlen(str)+1) * sizeof(char *));` is a poor way and allocates too much memory, the lazy way was to do `malloc((strlen(str)+1) / 2));`supposing the worst case where each word has only one leter, but again this is a poor way. I don't like poor programming ;-) On a 32 bits machine a pointer uses 32bits. – bruno Jan 20 '19 at 18:15
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/187017/discussion-between-payne-and-bruno). – payne Jan 20 '19 at 20:46
  • @bruno I've found why my solution seemed weirdly buggy: `printf` being buffered in C, I had to add `\n` at the of my printed stuff in order to be able to debug with that. It kept me looking for bugs at the wrong places in my code. My actual error was within the `while-loop`: the very last iteration of the while was assigning a NullPointer which caused a crash (which somehow CLion didn't think was a good idea to let me know and would simply interrupt execution with a `Process finished with exit code 0`). – payne Jan 22 '19 at 03:32
  • @payne _printf_ on _stdin_ is buffered and \n flush yes. _printf_ on _stderr_ is *not* buffered – bruno Jan 22 '19 at 06:23
2

An approach to split a string of unknown number of words and make them available in return from a function would require a function that returns a pointer-to-pointer-to-char. This allows a true dynamic approach where you allocate some initial number of pointers (say 2, 4, 8, etc..) make a single pass through your string using strtok keeping track of the number of pointers used, allocating storage fro each token (word) as you go and when the number of pointers used equals the number allocated, you simply realloc storage for additional pointers and keep going.

A short example implementing the function splitstring() that does that could look similar to the following:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NPTR    8   /* initial number of pointers to allocate */
#define MAXD   32   /* maximum no chars for delimiter */
#define MAXC 1024   /* maximum no chars for user input */

char **splitstring (const char *str, const char *delim, size_t *nwords)
{
    size_t nptr = NPTR,             /* initial pointers */
        slen = strlen (str);        /* length of str */
    char **strings = malloc (nptr * sizeof *strings),   /* alloc pointers */
        *cpy = malloc (slen + 1),   /* alloc for copy of str */
        *p = cpy;                   /* pointer to cpy */

    *nwords = 0;                    /* zero nwords */

    if (!strings) {     /* validate allocation of strings */
        perror ("malloc-strings");
        free (cpy);
        return NULL;
    }

    if (!cpy) {         /* validate allocation of cpy */
        perror ("malloc-cpy");
        free (strings);
        return NULL;
    }
    memcpy (cpy, str, slen + 1);    /* copy str to cpy */

    /* split cpy into tokens */
    for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
        size_t len;             /* length of token */
        if (*nwords == nptr) {  /* all pointers used/realloc needed? */
            void *tmp = realloc (strings, 2 * nptr * sizeof *strings);
            if (!tmp) {         /* validate reallocation */
                perror ("realloc-strings");
                if (*nwords)    /* if words stored, return strings */
                    return strings;
                else {          /* no words, free pointers, return NULL */
                    free (strings);
                    return NULL;
                }
            }
            strings = tmp;      /* assign new block to strings */
            nptr *= 2;          /* update number of allocate pointers */
        }
        len = strlen (p);       /* get token length */
        strings[*nwords] = malloc (len + 1);    /* allocate storage */
        if (!strings[*nwords]) {                /* validate allocation */
            perror ("malloc-strings[*nwords]");
            break;
        }
        memcpy (strings[(*nwords)++], p, len + 1);  /* copy to strings */
    }
    free (cpy);     /* free storage of cpy of str */

    if (*nwords)    /* if words found */
        return strings;

    free (strings); /* no strings found, free pointers */
    return NULL;
}

int main (void) {

    char **strings = NULL, 
        string[MAXC],
        delim[MAXD];
    size_t nwords = 0;

    fputs ("enter string    : ", stdout);
    if (!fgets (string, MAXC, stdin)) {
        fputs ("(user canceled input)\n", stderr);
        return 1;
    }

    fputs ("enter delimiters: ", stdout);
    if (!fgets (delim, MAXD, stdin)) {
        fputs ("(user canceled input)\n", stderr);
        return 1;
    }

    if ((strings = splitstring (string, delim, &nwords))) {
        for (size_t i = 0; i < nwords; i++) {
            printf (" word[%2zu]: %s\n", i, strings[i]);
            free (strings[i]);
        }
        free (strings);
    }
    else
        fputs ("error: no delimiter found\n", stderr);
}

(note: the word count nwords is passed as a pointer to the splitstring() function to allow the number of words to be updated within the function and made available back in the calling function, while returning a pointer-to-pointer-to-char from the function itself)

Example Use/Output

$ ./bin/stringsplitdelim
enter string    : my dog has fleas and my cat has none and snakes don't have fleas
enter delimiters:
 word[ 0]: my
 word[ 1]: dog
 word[ 2]: has
 word[ 3]: fleas
 word[ 4]: and
 word[ 5]: my
 word[ 6]: cat
 word[ 7]: has
 word[ 8]: none
 word[ 9]: and
 word[10]: snakes
 word[11]: don't
 word[12]: have
 word[13]: fleas

(note: a ' ' (space) was entered as the delimiter above resulting in delim containing " \n" (exactly what you want) by virtue of having used the line-oriented input function fgets for user input)

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/stringsplitdelim
==12635== Memcheck, a memory error detector
==12635== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==12635== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==12635== Command: ./bin/stringsplitdelim
==12635==
enter string    : my dog has fleas and my cat has none and snakes don't have fleas
enter delimiters:
 word[ 0]: my
 word[ 1]: dog
 word[ 2]: has
 word[ 3]: fleas
 word[ 4]: and
 word[ 5]: my
 word[ 6]: cat
 word[ 7]: has
 word[ 8]: none
 word[ 9]: and
 word[10]: snakes
 word[11]: don't
 word[12]: have
 word[13]: fleas
==12635==
==12635== HEAP SUMMARY:
==12635==     in use at exit: 0 bytes in 0 blocks
==12635==   total heap usage: 17 allocs, 17 frees, 323 bytes allocated
==12635==
==12635== All heap blocks were freed -- no leaks are possible
==12635==
==12635== For counts of detected and suppressed errors, rerun with: -v
==12635== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thank you for your input. It seems more flexible than the other answer provided since, among other things, it allows the user to input a delimiter. I've also appreciated your **Memory Use/Error Check** section. However, I've accepted the other answer as its code seemed more readable and restricted to my domain of use. – payne Jan 20 '19 at 19:33
  • Hey David. I printed out `len` and `size of strings[*nwords]` in split_strings function. len value is as is splited string length. however sizeof return the same value as 8. I read open group malloc. somehow the reference still too hard for me now to understand. Can you drop some hints. – jian Aug 09 '22 at 15:22
  • @jian -- what is `sizeof (a_pointer)`? – David C. Rankin Aug 09 '22 at 16:30
  • Keep up the good work. Those little nuggets are the learning that slowly gets laid down that turns you into a quality programmer. – David C. Rankin Aug 09 '22 at 16:48