0

I have a simple program to split the input string based on the delim. The algorithm is as follows:

  1. Count the size of batches that I am going to divide the original input to.

  2. Malloc the 2D pointer results according to count size + 1 (last one make it NULL)

  3. Call strsep to cut the source piece by piece (token)

  4. Malloc the pointer results[i] by strlen(token)
  5. strcpy(results[i], token)
  6. Return results

I couldn't figure out why these two free() statement will cause me errors. On line 51:

free(*(tokens + i));

This line gave me the following error:

free(): invalid pointer

if I commented out this line, on line 54:

free(tokens);

Gave me the following error:

free(): invalid next size (fast)

I went through the discussions about these two errors, but I still couldn't figure out why these were causing problems to my program.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

char** str_split(char *str, const char *delim)
{
  char *token;
  char **results;
  int count = 0;

  int i;
  for( i = 0; i < strlen(str); i++ ) 
    //TODO: check delim format
    //      assume to be one char string for now
    if( str[i] == *delim )
      count++;

  count++;  

  results = (char**) malloc(sizeof(char*) * count); /* line 21 */

  int index = 0;
  while( (token = strsep(&str, delim)) != NULL )
  {
    int size = strlen(token);
    results[index] = (char*) malloc(sizeof(char*) * size); /* line 27 */
    strcpy( results[index], token );
    index++;
  }
  results[index] = NULL;                              /* line 31 */
  return results;
}

int main()
{
    char source[] = "name id ip";
    char **tokens;

    printf("source=[%s]\n\n", source);

    tokens = str_split(source, " ");

    if (tokens)
    {
        int i;
        for (i = 0; *(tokens + i); i++)
        {
            printf("batch=[%s]\n", *(tokens + i));
            free(*(tokens + i));                     /* line 51*/
        }
        printf("\n");
        free(tokens);                                /* line 54 */
    }
    return 0;
}

I also tried gdb to see more detail to what was going on. I was able to examine the pointer on line 51:

gdb> x /s *(tokens+i) 
0x602030:       "name" 

Also on line 54:

gdb> x /s *tokens
0x602010:       "0 `"

I also tried Valgrid to examine my compiled file but I didn't know how to interpret this analysis:

==26070== Command: ./str_parse
==26070== 
==26070== Invalid write of size 8
==26070==    at 0x40086D: str_split (str_parse.c:31)
==26070==    by 0x4008D4: main (str_parse.c:43)
==26070==  Address 0x51fc058 is 0 bytes after a block of size 24 alloc'd
==26070==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26070==    by 0x4007CB: str_split (str_parse.c:21)
==26070==    by 0x4008D4: main (str_parse.c:43)
==26070== 
==26070== Invalid read of size 8
==26070==    at 0x40094D: main (str_parse.c:48)
==26070==  Address 0x51fc058 is 0 bytes after a block of size 24 alloc'd
==26070==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26070==    by 0x4007CB: str_split (str_parse.c:21)
==26070==    by 0x4008D4: main (str_parse.c:43)
==26070== 
==26070== 
==26070== HEAP SUMMARY:
==26070==     in use at exit: 0 bytes in 0 blocks
==26070==   total heap usage: 4 allocs, 4 frees, 88 bytes allocated
==26070== 
==26070== All heap blocks were freed -- no leaks are possible
==26070== 
==26070== For counts of detected and suppressed errors, rerun with: -v
==26070== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
ObigOne
  • 13
  • 6
  • you should check the value of `index` so that it does not go past `count` – AndersK Oct 17 '16 at 04:47
  • Welcome to Stack Overflow. Please read the [About] and [Ask] pages if you've not already done so. On the whole, you've asked a pretty good question. One preference around here is 'no line numbers' in code. The disadvantage of line numbers is that other people can't simply copy'n'paste your code into their system and compile and run it; they have to excise the line numbers first. It can be helpful to add key line numbers as trailing comments (so a comment like `/* line 31 */` after line 31 would be helpful in this case). – Jonathan Leffler Oct 17 '16 at 05:51
  • Thanks Jonathan, I will make the corrections. – ObigOne Oct 17 '16 at 06:02
  • `results[index] = (char*) malloc(sizeof(char*) * size);` This doesn't make any sense. – Lundin Oct 17 '16 at 06:42

2 Answers2

1

I think the main culprit is:

results[index] = NULL;

I suspect the value of index is count at that point, which modifies results using an out of bounds index. That leads to undefined behavior. If you want to use this approach, you need to use:

results = malloc(sizeof(char*) * (count+1));

See Do I cast the result of malloc?.


Also, you are using the wrong size to allocate memory for the strings themselves.

You are using:

 results[index] = (char*) malloc(sizeof(char*) * size)
 strcpy( results[index], token );

That should be:

 results[index] = malloc(size+1)
 strcpy( results[index], token );
Community
  • 1
  • 1
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • Thanks for the input Mr Sahu. I read through the link you provided. I will corrected my usage of malloc from now on. I think I learned it from somewhere long time ago. Didn't realize that it's a bad coding till today. Also, I realized that I didn't increase the malloc area by one to leave room for '\0'. I modified my original code and it worked!!! I do need results[index] = NULL as it's a trick to stop the for loop so I don't need to know how many batches it is divided to.
    results = malloc(sizeof(char*) * (count+2));
    
    
    – ObigOne Oct 17 '16 at 05:48
  • Didn't know that I can't use formatting in comments, move it to reply below – ObigOne Oct 17 '16 at 05:49
  • 1
    @ObigOne: You can use back-ticks ``…`code`…`` around code in comments; getting back-ticks to appear is harder. – Jonathan Leffler Oct 17 '16 at 05:52
  • @ObigOne you can use the accent grave for format in comments. And casting malloc is also taught in school/university often, also because in c++ it is mandatory – Kami Kaze Oct 17 '16 at 05:52
  • @ObigOne: You've got a gatepost problem and then an extra off-by-one problem. If you've got a fence 100 metres long and need fenceposts every 10 m, how many fenceposts do you need? The answer isn't 10. But you additionally need an extra bucket to add the null that terminates the list of fenceposts. – Jonathan Leffler Oct 17 '16 at 05:55
  • Thanks for the teachings guys. hmm.... I don't seem to find options to edit my comments. – ObigOne Oct 17 '16 at 06:08
0

Thanks for the input Mr Sahu. I read through the link you provided. I will corrected my usage of malloc from now on. I think I learned it from somewhere long time ago. Didn't realize that it's a bad coding till today.

Also, I realized that I didn't increase the malloc area by one to leave room for '\0'. I modified my original code and it worked!!!

  1. Change line 21 to
results = malloc(sizeof(char*) * (count+1)); 
  1. Change line 27 to
results[index] = malloc(sizeof(char*) * (size+1) );
  1. Change line 31 to
results[count] = NULL;
ObigOne
  • 13
  • 6