2

My str_split function returns (or at least I think it does) a char** - so a list of strings essentially. It takes a string parameter, a char delimiter to split the string on, and a pointer to an int to place the number of strings detected.

The way I did it, which may be highly inefficient, is to make a buffer of x length (x = length of string), then copy element of string until we reach delimiter, or '\0' character. Then it copies the buffer to the char**, which is what we are returning (and has been malloced earlier, and can be freed from main()), then clears the buffer and repeats.

Although the algorithm may be iffy, the logic is definitely sound as my debug code (the _D) shows it's being copied correctly. The part I'm stuck on is when I make a char** in main, set it equal to my function. It doesn't return null, crash the program, or throw any errors, but it doesn't quite seem to work either. I'm assuming this is what is meant be the term Undefined Behavior.

Anyhow, after a lot of thinking (I'm new to all this) I tried something else, which you will see in the code, currently commented out. When I use malloc to copy the buffer to a new string, and pass that copy to aforementioned char**, it seems to work perfectly. HOWEVER, this creates an obvious memory leak as I can't free it later... so I'm lost.

When I did some research I found this post, which follows the idea of my code almost exactly and works, meaning there isn't an inherent problem with the format (return value, parameters, etc) of my str_split function. YET his only has 1 malloc, for the char**, and works just fine.

Below is my code. I've been trying to figure this out and it's scrambling my brain, so I'd really appreciate help!! Sorry in advance for the 'i', 'b', 'c' it's a bit convoluted I know.

Edit: should mention that with the following code,

ret[c] = buffer;
printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);

it does indeed print correctly. It's only when I call the function from main that it gets weird. I'm guessing it's because it's out of scope ?

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define DEBUG

#ifdef DEBUG
    #define _D if (1)
#else
    #define _D if (0)
#endif

char **str_split(char[], char, int*);
int count_char(char[], char);

int main(void) {
    int num_strings = 0;
    char **result = str_split("Helo_World_poopy_pants", '_', &num_strings);

    if (result == NULL) {
        printf("result is NULL\n");
        return 0;
    }

    if (num_strings > 0) {
        for (int i = 0; i < num_strings; i++) {
            printf("\"%s\" \n", result[i]);
        }
    }

    free(result);

    return 0;
}

char **str_split(char string[], char delim, int *num_strings) {

    int num_delim = count_char(string, delim);
    *num_strings = num_delim + 1;

    if (*num_strings < 2) {
        return NULL;
    }

    //return value
    char **ret = malloc((*num_strings) * sizeof(char*));

    if (ret == NULL) {
        _D printf("ret is null.\n");
        return NULL;
    }

    int slen = strlen(string);
    char buffer[slen];

    /* b is the buffer index, c is the index for **ret */
    int b = 0, c = 0;
    for (int i = 0; i < slen + 1; i++) { 

        char cur = string[i];

        if (cur == delim || cur == '\0') {

            _D printf("Copying content of buffer to ret[%i]\n", c); 
            //char *tmp = malloc(sizeof(char) * slen  + 1);
            //strcpy(tmp, buffer);

            //ret[c] = tmp;
            ret[c] = buffer;
            _D printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
            //free(tmp);

            c++;
            b = 0;
            continue;
        }

        //otherwise

        _D printf("{%i} Copying char[%c] to index [%i] of buffer\n", c, cur, b);

        buffer[b] = cur;
        buffer[b+1] = '\0'; /* extend the null char */
        b++;

        _D printf("Buffer is now equal to: \"%s\"\n", buffer);
    }

    return ret;
}

int count_char(char base[], char c) {
    int count = 0;
    int i = 0;

    while (base[i] != '\0') {
        if (base[i++] == c) {
            count++;
        }
    }
    _D printf("Found %i occurence(s) of '%c'\n", count, c);
    return count;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 2
    "*Then it copies the buffer to the char\*\**" - no, it doesn't. Where? – melpomene Mar 25 '19 at 23:15
  • at: ret[c] = buffer; am I missing something really obvious? Sorry! Edit: If you're referring to the usage of copies, I used 'copy' and 'set' interchangeably. As in setting the value of char**[index] equal to the buffer. – mysteryuser12345 Mar 25 '19 at 23:19
  • 2
    That doesn't copy the buffer. `ret[c]` is just a pointer. You're setting it to point into `buffer`, which is a local variable, which is destroyed when the surrounding function returns. Also, all elements of `ret` have the same value (`buffer`). You're returning an array containing identical garbage pointers. – melpomene Mar 25 '19 at 23:21
  • You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior. – paddy Mar 25 '19 at 23:21
  • @melpomene Ah! I figured it was something like that, but could not for the life of me put it into words. Thanks a lot, sir. So how would you recommend I proceed from here- is there any alternative to malloc'ing each string separately? As of course, I would never be able to free them. – mysteryuser12345 Mar 25 '19 at 23:30
  • Why wouldn't you be able to free them? – melpomene Mar 25 '19 at 23:32
  • @melpomene Well, where do I free them? I can't from main() with the way the function is formatted currently, and I don't think I can free them after assigning char** to point to it, as then it would be pointing to nothing. Sorry if I'm sounding like a gigantic idiot, really appreciate the help – mysteryuser12345 Mar 25 '19 at 23:36
  • You would free them in a loop: `for(int i = 0; i < num_strings; ++i) free(result[i]);` and then `free(result)`. See my answer for an alternative, more convenient approach with less allocations. – paddy Mar 25 '19 at 23:48

2 Answers2

0

The string pointers you store into the res with ret[c] = buffer; array point to an automatic array that goes out of scope when the function returns. The code subsequently has undefined behavior. You should allocate these strings with strdup().

Note also that it might not be appropriate to return NULL when the string does not contain a separator. Why not return an array with a single string?

Here is a simpler implementation:

#include <stdlib.h>

char **str_split(const char *string, char delim, int *num_strings) {
    int i, n, from, to;
    char **res;

    for (n = 1, i = 0; string[i]; i++)
        n += (string[i] == delim);

    *num_strings = 0;
    res = malloc(sizeof(*res) * n);
    if (res == NULL)
        return NULL;

    for (i = from = to = 0;; from = to + 1) {
        for (to = from; string[to] != delim && string[to] != '\0'; to++)
            continue;
        res[i] = malloc(to - from + 1);
        if (res[i] == NULL) {
            /* allocation failure: free memory allocated so far */
            while (i > 0)
                free(res[--i]);
            free(res);
            return NULL;
        }
        memcpy(res[i], string + from, to - from);
        res[i][to - from] = '\0';
        i++;
        if (string[to] == '\0')
            break;
    }
    *num_strings = n;
    return res;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
0

You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior.

To get around this requires one of the following:

  • Allow the function to modify the input string (i.e. replace delimiters with null-terminator characters) and return pointers into it. The caller must be aware that this can happen. Note that supplying a string literal as you are doing here is illegal in C, so you would instead need to do:

    char my_string[] = "Helo_World_poopy_pants";
    char **result = str_split(my_string, '_', &num_strings);
    

    In this case, the function should also make it clear that a string literal is not acceptable input, and define its first parameter as const char* string (instead of char string[]).

  • Allow the function to make a copy of the string and then modify the copy. You have expressed concerns about leaking this memory, but that concern is mostly to do with your program's design rather than a necessity.

    It's perfectly valid to duplicate each string individually and then clean them all up later. The main issue is that it's inconvenient, and also slightly pointless.

Let's address the second point. You have several options, but if you insist that the result be easily cleaned-up with a call to free, then try this strategy:

  1. When you allocate the pointer array, also make it large enough to hold a copy of the string:

    // Allocate storage for `num_strings` pointers, plus a copy of the original string,
    // then copy the string into memory immediately following the pointer storage.
    char **ret = malloc((*num_strings) * sizeof(char*) + strlen(string) + 1);
    char *buffer = (char*)&ret[*num_strings];
    strcpy(buffer, string);
    
  2. Now, do all your string operations on buffer. For example:

    // Extract all delimited substrings.  Here, buffer will always point at the
    // current substring, and p will search for the delimiter.  Once found,
    // the substring is terminated, its pointer appended to the substring array,
    // and then buffer is pointed at the next substring, if any.
    int c = 0;
    for(char *p = buffer; *buffer; ++p)
    {
        if (*p == delim || !*p) {
           char *next = p;
           if (*p) {
               *p = '\0';
               ++next;
           }
           ret[c++] = buffer;
           buffer = next;
        }
    }
    
  3. When you need to clean up, it's just a single call to free, because everything was stored together.

paddy
  • 60,864
  • 6
  • 61
  • 103
  • Thanks so much for the help! I'm still reading through it and trying to understand, but I think this right here is perfect. I'm a little confused by the example you gave in 2. but I'm sure I'll get it eventually. – mysteryuser12345 Mar 25 '19 at 23:54
  • I realized I had a small error, so please see my edit. The loop is pretty basic. It simply walks through each character in the string using `char *p`, leaving `buffer` pointed at the beginning of the substring currently being searched. When it finds a delimiter (or end of string), it terminates that substring, appends the `buffer` pointer to your array, and then points `buffer` to the start of the next substring, if any. The loop terminates when `buffer` points to an empty string, which will be the case once all strings (including none, if original was empty) have been extracted. – paddy Mar 25 '19 at 23:59