0

I am having trouble writing a string split function with delimiters. I based my function off of the main function featured here: http://www.cplusplus.com/reference/cstring/strtok/.

When I test it via main, I am only able to pass it char[], but not char*. When passing a char*, the program seg faults.

I.e. passing some char str[] through str_split works but not some char* str. Any help would be greatly appreciated.

char** str_split(char* str, const char* delim)
{
  char* tmp;

  char** t = (char**)malloc(sizeof(char*) * 1024);
  char** tokens = t;

  tmp = strtok(str, delim);

  while(tmp != NULL)
  {
    *tokens = (char*)malloc(sizeof(char) * strlen(tmp));
    *tokens = strdup(tmp);
    tokens++;
    tmp = strtok(NULL, delim);
  }

  return t;
}

3 Answers3

0

When I test it via main, I am only able to pass it char[], but not char*. When passing a char*, the program seg faults.

Going by the above chances are that you are not allocating memory for your char * in main or you are passing a string literal.

Sadique
  • 22,572
  • 7
  • 65
  • 91
0

These two lines gives you two different problems:

*tokens = (char*)malloc(sizeof(char) * strlen(tmp));
*tokens = strdup(tmp);

The first line will allocate strlen(tmp) bytes, but the problem is that strings have an extra character to terminate the string, so you really need to allocate strlen(tmp) + 1 bytes.

The second line overwrites the original pointer you got from malloc, causing a memory leak.

Also, in C you should not cast the return of malloc.

Oh, and another note: sizeof(char) is specified to always return 1, no matter the actual bit-size of the char type.


As for your seg-faulting, I'm guessing you are calling your function with a string literal, like e.g.

some_var = str_split("hello world", " ");

Or possibly

char *string = "hello world";
some_var = str_split(string, " ");

This will cause undefine behavior, because string literals are actually a pointer to a constant array of characters, and strtok modifies the string. Undefined behavior is arguably the most common cause of crashes.

If you enable more warnings when building you would have gotten a warning about this, or maybe you did get a warning but ignored it, or used casting to get rid of it. Warnings from the compilers are often good indicators of you doing something you should not do, hiding it by e.g. casting will only silent the warning but not fix the problem.


There are also a couple of other problems with your code. One is that if there are only one "word"/"token" in the "sentence" you pass in to the function, you waste 4092 or 8184 bytes (depending on 32- or 64-bit platform) in that allocation. You might want to do a separate tokenization loop first (on a temporary copy of the string) first to find out the exact number of "tokens" or "words" in the input.

Doing this counting will also solve the other problem: What if there are more than 1024 tokens/words? Your loop will blissfully write out of bounds in that case.

Both of these cases are extremes, and your standard use-case may be a better fit for your current code, but it's still something to think about.

Community
  • 1
  • 1
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • thank you, this helped. I was able to fix the errors in my program as well as hunt down the memory leaks. I will try and come up with a separate tokenization loop, thank you for the suggestion – user3467217 Mar 27 '14 at 07:59
0

you may be assigning value to char * at declaration

char *str="abcdef";

or you may not have allocated memory to the string pointed by char * str. In both of these cases strtok() will result in segmentation fault.

LearningC
  • 3,182
  • 1
  • 12
  • 19