-2

I have been trying to write a function that takes in strings as a line and returns a pointer to an array of words. The function written below does something similar How can I rewrite the following code1 but it should be better than code2 by being able to change the delimiter. However, code1 works but during memory allocation the same memory is duplicated for the words array. Thereby causing word duplication.

Code 1:

char *split(const char *string) {
    char *words[MAX_LENGTH / 2];
    char *word = (char *)calloc(MAX_WORD, sizeof(char));
    memset(word, ' ', sizeof(char));
    static int index = 0;
    int line_index = 0;
    int word_index = 0;

    while (string[line_index] != '\n') {
        const char c = string[line_index];
        if (c == ' ') {
            word[word_index+ 1] = '\0';
            memcpy(words + index, &word, sizeof(word));
            index += 1;
            if (word != NULL) {
                free(word);
                char *word = (char *)calloc(MAX_WORD, sizeof(char));
                memset(word, ' ', sizeof(char));
            }
            ++line_index;
            word_index = 0;
            continue;
        }
        if (c == '\t')
            continue;
        if (c == '.')
            continue;
        if (c == ',')
            continue;

        word[word_index] = c;
        ++word_index;
        ++line_index;
    }

    index = 0;
    if (word != NULL) {
        free(word);
    }
    return *words;
}

Code 2:

char **split(char *string) {
    static char *words[MAX_LENGTH / 2];
    static int index = 0;
    // resetting words 
    for (int i = 0; i < sizeof(words) / sizeof(words[0]); i++) {
         words[i] = NULL;
    }
    const char *delimiter = " ";
    char *ptr = strtok(string, delimiter);
    while (ptr != NULL) {
        words[index] = ptr;
        ptr = strtok(NULL, delimiter);
        ++index;
    }
    index = 0;
    return words;
}

However I noticed that the memory of word+index is been reassigned to the same location thereby causing word duplication.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
hebronace
  • 77
  • 8
  • 2
    What is your question? Providing the delimier to the function or memory problem during splitting? BTW: No need to shout in the title. – Gerhardh Nov 25 '19 at 12:01
  • Code 2 does not have `word` variable. Is your problem in Code 1 or Code 2? – Gerhardh Nov 25 '19 at 12:04
  • Code 2 works but i can't change the delimitter since it is const char* but I want to use code 1 since check for all kind of non-word character can be discovered but the memory is been reallocated again thereby causing word duplication in the char* word[] by duplicating the same memory location across it index – hebronace Nov 25 '19 at 23:31
  • [Splitting a String and returning an array of Strings](https://stackoverflow.com/questions/54261257/splitting-a-string-and-returning-an-array-of-strings/54263440?r=SearchResults&s=6|28.7528#54263440) may be helpful. – David C. Rankin Dec 14 '19 at 04:44

1 Answers1

1

strtok() always returns a different pointer into the initial string. This cannot produce duplicates, unless you call it twice with the same input string (maybe with new contents).

However, your function returns a pointer to a static array, which is overwritten on each call to split(), voiding the results of all previous calls. To prevent this,

  • either allocate new memory in each call (which must be freed by the caller):

    char *words = calloc(MAX_LENGTH / 2, 1);
    
  • or return a struct instead (which is always copied by value):

    struct wordlist { char *word[MAX_LENGTH / 2]; };
    
    wordlist split(char *string)
    {
        wordlist list = {};
        /* ... */
        list.word[index] = /* ... */;
        /* ... */
        return list;
    }
    
Ralph
  • 335
  • 2
  • 5