3

I am trying to create a function that takes a string, splits it into words and return an array with the words in it. I am not allowed to use any pre-made functions other than malloc within the splitting function. Finally I have to set my function in this form char **ft_split_whitespaces(char *str) My current output looks like that:


    d this is me
    s is me
    s me
    r

Expected output:


    Hello
    World
    This
    Is
    Me

my full code is in the following codes:


    #include <stdio.h>
    #include <stdlib.h>
    
    int     count_words(char *str)
    {
        int i; 
        int word;
        
        i = 0;
        word = 1;
        while(str[i]!='\0')
        {
            if(str[i]==' ' || str[i]=='\n' || str[i]=='\t' 
            || str[i]=='\f' || str[i]=='\r' || str[i]=='\v')
                word++;
            i++;
        }
        return (word);
    }
    
    char    **ft_split_whitespaces(char *str)
    {
        int index;
        int size;
        int index2;
        char **arr;
        
        index = 0;
        index2 = 0;
        size = count_words(str);
        arr = (char **)malloc(size * sizeof(char));
        if (arr == NULL)
            return ((char **)NULL);
        while (str[index])
        {
            if(str[index] == ' ')
            {
                index++;
                value++;
                index2++;
            }
            else
                *(arr+index2) = (char*) malloc(index * sizeof(char));
                *(arr+index2) = &str[index];    
            index++;
        }
        **arr = '\0';
        return (arr);
    }
    
    int main()
    {
        char a[] = "Hello World This Is Me";
        char **arr;
        int i;
        int ctr = count_words(a);
        arr = ft_split_whitespaces(a);
        
        for(i=0;i < ctr;i++)
            printf("%s\n",arr[i]);
        return 0;
    }

Alsakka
  • 155
  • 10
  • This is a good question. It's discussed in [these course notes](https://www.eskimo.com/~scs/cclass/notes/sx10h.html). See also [this question](https://stackoverflow.com/questions/49372173). – Steve Summit Jul 15 '21 at 14:46
  • How is the code that calls `ft_split_whitespaces` supposed to know how many entries the returned array has? Is there supposed to be a `NULL` on the end? (If so, don't forget to allocate space for that pointer!) – David Schwartz Jul 15 '21 at 15:20
  • Is `ft_split_whitespaces()` allowed to modify the original string (e.g. insert zero terminators at the end of words and return an array of pointers to the start of words, to avoid allocating memory for each word)? You could possibly also use recursion to avoid `count_words()`. – Brendan Jul 15 '21 at 18:58
  • You shouldn't cast the return value of malloc if you are compiling with a C compiler: https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc#:~:text=In%20C%2C%20you%20don't,compiler%2C%20a%20cast%20is%20needed. – Jerry Jeremiah Jul 15 '21 at 22:02

1 Answers1

3

You have quite a few errors in your program:

  1. arr = (char **)malloc(size * sizeof(char)); is not right since arr is of type char**. You should use sizeof(char*) or better (sizeof(*arr)) since sizeof(char) is usually not equal to sizeof(char*) for modern systems.

  2. You don't have braces {} around your else statement in ft_split_whitespaces which you probably intended. So your conditional logic breaks.

  3. You are allocating a new char[] for every non--whitespace character in the while loop. You should only allocate one for every new word and then just fill in the characters in that array.

  4. *(arr+index2) = &str[index];This doesn't do what you think it does. It just points the string at *(arr+index2) to str offset by index. You either need to copy each character individually or do a memcpy() (which you probably can't use in the question). This explains why your answer just provides offsets into the main string and not the actual tokens.

  5. **arr = '\0'; You will lose whatever you store in the 0th index of arr. You need to individually append a \0 to each string in arr.

  6. *(arr+index2) = (char*) malloc(index * sizeof(char)); You will end up allocating progressively increasing size of char arrays at because you are using index for the count of characters, which keeps on increasing. You need to figure out the correct length of each token in the string and allocate appropriately.

Also why *(arr + index2)? Why not use the much easier to read arr[index2]?


Further clarifications:

Consider str = "abc de"

You'll start with

*(arr + 0) = (char*) malloc(0 * sizeof(char));
//ptr from malloc(0) shouldn't be dereferenced and is mostly pointless (no pun), probably NULL
*(arr + 0) = &str[0]; 

Here str[0] = 'a' and is a location somehwhere in memory, so on doing &str[0], you'll store that address in *(arr + 0)

Now in the next iteration, you'll have

*(arr + 0) = (char*) malloc(1 * sizeof(char)); 
*(arr + 0) = &str[1]; 

This time you replace the earlier malloc'd array at the same index2 again with a different address. In the next iterations *(arr + 0) = (char*) malloc(2 * sizeof(char));. You end up resetting the same *(arr + index2) position till you encounter a whitespace after which you do the same thing again for the next word. So don't allocate arrays for every index value but only if and when required. Also, this shows that you'll keep on increasing the size passed to malloc with the increasing value of index which is what #6 indicated.

Coming to &str[index].

You are setting (arr + index2) i.e. a char* (pointer to char) to another char*. In C, setting a pointer to another pointer doesn't copy the contents of the second pointer to the first, but only makes both of them point to the same memory location. So when you set something like *(arr + 1) = &str[4], it's just a pointer into the original string at index = 4. If you try to print this *(arr + 1) you'll just get a substring from index = 4 to the end of the string, not the word you're trying to obtain.

**arr = '\0' is just dereferencing the pointer at *arr and setting its value to \0. So imagine if you had *(arr + 0) = "hello\0", you'll set it to "\0ello\0". If you're ever iterating over this string, you'll never end up traversing beyond the first '\0' character. Hence you lose whatever *arr was earlier pointing to.

Also, *(arr + i) and arr[i] are exactly equivalent and make for much better readability. It better conveys that arr is an array and arr[i] is dereferencing the ith element.

Zoso
  • 3,273
  • 1
  • 16
  • 27
  • Thank you alot for your comment, but I have to admit starting from step 3 onwards I really didnt understand how I should tackle those problems, Have to say I am pretty new hands on with pointers and malloc, so if possible to clarify them more it would be great! Thanks anyway – Alsakka Jul 15 '21 at 15:42
  • 1
    @Alsakka I've added further explanation. Fix these and you should have a working solution. I'm not going to write the exact answer since that isn't what SO is for but I hope you're able to see and work the solution for yourself. You're on the right track. Good luck! – Zoso Jul 15 '21 at 20:20