1

I am new to C language and hoping to get a good understanding char array assignment coming from Java background. I use stringTok() to split the local sentence[] array into individual array word and want to assign that word to char ** tokens that was being passed in. I received the following output:

value of i:4   
len:1 ,  
len:1 ,  
len:6 ,8?k  
len:0 ,  

Some have values (the assigned values are gibberish) and some don't. I initially thought that I had to allocate memory to each char in the character array. However, that didn't seem to work either. So, I suppose I must have done something wrong then. I have also tried using tokens[4][10] instead of char ** tokens and did not make any difference either. So, I was stuck in this for quite a while.

#include <string.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void  stringTok(char** tokens)
{
    int i = 0;
    char sentence[] = "jake is really cool.";
    char *word = strtok(sentence, " .");

    while (word != NULL)
    {
        //printf("%s\n", word);
        tokens[i++] = word;
        word = strtok(NULL, " .");
    }
    printf("value of i:%d\n", i);
    tokens[i] = NULL;
}

int main()
{ 
    char **cmdArr = calloc(5, sizeof(char*));

    for (int i = 0; i < 5; i++)
        cmdArr[i] = (char*)calloc(10, sizeof(char));
    stringTok(cmdArr);
    for(int i = 0; i < 4; i++)
    {
        printf("len:%ld ,", strlen(cmdArr[i]));
        printf("%s\n", cmdArr[i]);
    }

    return 0;
}
Nate Lee
  • 17
  • 7
  • 3
    In C you do not cast malloc - http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc – Ed Heal May 15 '17 at 22:57
  • 3
    Undefined behavior for accessing an object after the end of its lifetime. (Also, in this case, means using the value of an object (the pointer to the object) while its value is indeterminate) – EOF May 15 '17 at 22:57
  • 3
    `char sentence[] = "jake is really cool.";` creates a local variable in your `stringTok()` function. Your `strtok()` calls return pointers into that array. When your `stringTok()` function returns, your `sentence` variable ceases to exist and all the pointers into it become invalid. Hence undefined behavior. – Andrew Henle May 15 '17 at 22:59
  • @AndrewHenle I saw your point now. I added another parameter in the function stringTok instead of using sentence[ ] as a local variable. That apparently fixes the error. However, this then comes down to the hidden conceptual stuff in c. Would you say that we should not use or assign any values that local pointer reference within a function to pointer outside its scope? – Nate Lee May 15 '17 at 23:07
  • @NateLee You never do that. Tell me how that sits fine with how a computer works. Remember that C is a low level language with an absolutely amazing amount of control, only if you understand how to use it. – Shahe Ansar May 15 '17 at 23:19

1 Answers1

1

You already allocated storage for these values. The main mistake is that you used assignment thinking it would copy your string. Instead it overwrote the pointer to a temporary returned by strtok.

The quick fix is this:

while (word != NULL) {
    strcpy(tokens[i++], word);
    word = strtok(NULL, " .");
}

You should be a bit careful. Your strings only support 10 characters (which includes a null terminator). Usually, you would use strncpy which will stop if the source string is too long for the amount of storage you have available. This helps to avoid bad stuff like buffer overruns which give undefined behaviour and create vulnerabilities often exploited in cyber attacks.

A safer approach might be to let your tokenizer perform the allocations.

int tokenize( const char *sentence, char** tokens, int max_tokens )
{
    const char * delim = " .";
    int count = 0;

    // Make copy of sentence for strtok
    int len = strlen(sentence);
    char *copy = malloc(len+1);
    if( !copy ) return 0;
    strncpy( copy, len, sentence );
    copy[len] = '\0';

    // Allocate and copy tokens
    for( char *word = strtok(copy, delim);
         word && count < max_tokens;
         word = strtok(NULL, delim) )
    {
        int len = strlen(word);
        tokens[count] = malloc(len+1);
        if( !tokens[count] ) break;
        strncpy( tokens[count], len, word );
        tokens[count][len] = '\0';
        count++;
    }

    return count;
}

Now, this is a bit silly. Since you already copied the original and strtok is going to insert terminators, there's no need to allocate and copy everything. You could instead store copy in tokens[0], and then store word in each subsequent pointer. The caller just needs to know that to clean up memory it only needs free(tokens[0]) and never any of the others (which are just pointers into that block).

paddy
  • 60,864
  • 6
  • 61
  • 103