0

I'm trying to write a simple split function in c, where you supply a string and a char to split on, and it returns a list of split-strings:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char ** split(char * tosplit, char delim){
        int amount = 0;
        for (int i=0; i<strlen(tosplit); i++) {
                if (tosplit[i] == delim) {
                        amount++;
                }
        }
        char ** split_ar = malloc(0);
        int counter = 0;
        char *token = strtok(tosplit, &delim);
        while (token){
                split_ar[counter] = malloc(0);
                split_ar[counter] = token;
                token = strtok(NULL, &delim);
                counter++;
        }
        split_ar[counter] = 0;
        return split_ar;
}


int main(int argc, char *argv[]){
  if (argc == 2){
    char *tosplit = argv[1];
                char delim = *argv[2];
                char ** split_ar = split(tosplit, delim);
                while (*split_ar){
            printf("%s\n", *split_ar);
                        split_ar++;
                }
  } else {
    puts("Please enter words and a delimiter.");
  }
}

I use malloc twice: once to allocate space for the pointers to strings, and once allocate space for the actual strings themselves. The strange thing is: during testing I found that the code still worked when I malloc'ed no space at all.

When I removed the malloc-lines I got Segfaults or Malloc-assertion errors, so the lines do seem to be necessary, even though they don't seem to do anything. Can someone please explain me why?

I expect it has something to with strtok; the string being tokenized is initialized outside the function scope, and strtok returns pointers to the original string, so maybe malloc isn't even necessary. I have looked at many old SO threads but could find nothing similar enough to answer my question.

  • It's implementation-defined whether calling `malloc(0)` returns a null pointer, or a valid pointer to 0 bytes of memory. So you need to either (a) take care not to try to allocate 0 bytes of memory, or (b) if you do, don't print an error message if `malloc(0)` returns `NULL`. – Steve Summit Mar 10 '23 at 17:40
  • 1
    Maybe because `malloc((0)` returns a pointer to a memory zone of length 0, and when you dereferencing this pointer you get undefined behaviour which appears to work. – Jabberwocky Mar 10 '23 at 17:42
  • In answer to your question: *uninitialized* pointers are different than null pointers, and are different from properly-allocated, valid pointers. See also [this answer](https://stackoverflow.com/questions/75642522/c-calling-free-on-not-allocated-memory/75642914#75642914). – Steve Summit Mar 10 '23 at 17:42
  • You can't call `char ** split_ar = malloc(0);`, and then start filling in `split_ar[counter]`. For simplicity, try calling `split_ar = malloc(50 * sizeof(char *))`, where 50 is a guess of how many strings you might need. (That's not a good long-term solution, but it's a start.) – Steve Summit Mar 10 '23 at 17:44
  • If you say `split_ar[counter] = malloc(…);`, immediately followed by `split_ar[counter] = token;`, you're throwing away (failing to use) the memory you just allocated, and instead filling in `split_ar` with a pointer value — of dubious longevity — from `token`. – Steve Summit Mar 10 '23 at 17:47
  • Could you provide and input that triggers the malloc with size 0? – Jabberwocky Mar 10 '23 at 17:49
  • It's going to take a while to understand and properly explain everything that's going on here. (In fact half of the comments I've written here so far are pretty badly misleading.) You started with code (before you added the `malloc(0)` calls) that failed to work because the pointers weren't initialized. You "fixed" things by adding calls to `malloc(0)`, so now you've got initialized pointers, but they're not *properly* initialized, because they don't point to enough memory for what you're using them for. Your code "works" by purest accident. – Steve Summit Mar 10 '23 at 17:49
  • The malloc(0) was just me trying to get my code to crash. In my actual code I used malloc(amount \* sizeof(char \*\*)). (amount here being the amount of strings generated by strtok) Of course theres loads of questions about strings and memory on SO, but I was specifically interested in why the code works *despite* the malloc(0). None of the code here is copy pasted, this is just me trying to get a better understanding of c by experimenting. – Sam van Kesteren Mar 10 '23 at 18:30
  • @SamvanKesteren Ah, okay, fair enough. So my very first comment wasn't misleading: it's your answer. Evidently, on your system, `malloc(0)` returns a valid pointer, and you weren't storing so many strings in `split_ar` that the overflow caused actual problems. On a system where `malloc(0)` returns NULL, you would have gotten an immediate crash on your first assignment `split_ar[counter] = …`. (And even on your system, I suspect that if the string being split had more than about 10 tokens, you would have started seeing problems.) – Steve Summit Mar 10 '23 at 18:35
  • And, yes, if the string being split is allocated somewhere else, and is reasonably persistent, then the `token` pointer values you get back from `strtok` are decently valid, and it doesn't matter that you store them into `split_ar` without malloc'ing anything there. – Steve Summit Mar 10 '23 at 18:37
  • `malloc(0)` may return `NULL` anyway, and it is safe to pass `NULL` to `free()` or `realloc()` so there is absolutely no reason to use it ever. – user16217248 Mar 10 '23 at 18:45
  • Thx for the answers, that explains a lot! – Sam van Kesteren Mar 10 '23 at 19:10

1 Answers1

0

Why does malloc(0) in C not produce an error ... ?

why the code works despite the malloc(0).

Calling malloc(0) is OK. Using that pointer later as in split_ar[counter] = malloc(0); is undefined behavior (UB) as even split_ar[0] attempts to access outside the allocated memory.

When code incurs undefined behavior, there is no should produce an error. It is undefined behavior. There is no defined behavior in undefined behavior. It might "work", it might not. It is UB.

C does not certainly add safeguards to weak programming.

If you need a language to add extra checks for such mistakes, C is not the best answer.


Instead, allocate the correct amount. In OP's case I think it is, at most, amount + 2. (Consider the case when tosplit does not contain any delimiters.)

char **split_ar = malloc(sizeof split_ar[0] * (amount + 2));
if (split_ar == NULL) {
  Handle_OutOfMemory();
}

Further

Code is only attempting to copy the pointer and not the string.

// Worthless code
//split_ar[counter] = malloc(0);
//split_ar[counter] = token;

Instead, allocate for the string and copy the string. Research strdup().

// Sample code using the very common strdup().
split_ar[counter] = strdup(token);

Advanced

  1. Use strspn() and strcspn() to walk down an sing and parse it. This has the nice benefit of operating on a const string and readily knowing the size of the token - useful in allocating.

  2. Use the same technique twice to pre-calculate token count as well as parsing. This avoids differences that exist in OP's 2 methods.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256