0

I'm trying to solve a challenge, but I have no idea of what's wrong with my code!

The challenge is:

  • Create a function that splits a string of characters into words.
  • Separators are spaces, tabs and line breaks.
  • This function returns an array where each box contains a character-string’s address represented by a word. The last element of this array should be equal to 0 to emphasise the end of the array.
  • There can’t be any empty strings in your array. Draw the necessary conclusions. The given string can’t be modified.
  • Note: The only allowed function is malloc()

The bug/problem: I faced this problem and I tried to solve it but I wasn't able to identify what's wrong. I created a function named split_whitespaces() to do the job. When I print the array of strings inside of the split_whitespaces function, I get the following output:

Inside the function:
arr_str[0] = This
arr_str[1] = is
arr_str[2] = just
arr_str[3] = a
arr_str[4] = test!

And when I print the array of string inside the main function, I get the following output:

Inside the main function:
arr_str[0] = @X@?~
arr_str[1] = `X@?~
arr_str[2] = just
arr_str[3] = a
arr_str[4] = test!

I created a function word_count to count how many words in the input string so I can allocate memory using malloc and with word_count + 1 (null pointer).

int word_count(char *str) {
    int i;
    int w_count;
    int state;

    i = 0;
    w_count = 0;
    state = 0;
    while (str[i]) {
        if (!iswhitespace(str[i])) {
            if (!state)
                w_count++;
            state = 1;
            i++;
        } else {
            state = 0;
            i++;
        }
    }
    return (w_count);
}

And another function called strdup_w to mimic the behavior of strdup but just for single words:

char *strdup_w(char *str, int *index) {
    char *word;
    int len;
    int i;

    i = *index;
    len = 0;
    while (str[i] && !iswhitespace(str[i]))
        len++, i++;;
    word = (char *) malloc(len + 1);
    if (!word)
        return (NULL);
    i = 0;
    while (str[*index]) {
        if (!iswhitespace(str[*index])) {
            word[i++] = str[*index];
            (*index)++;
        } else
            break;
    }
    word[len] = '\0';
    return (word);
}

Here's my full code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **split_whitespaces(char *str);
char *strdup_w(char *str, int *index);
int word_count(char *str);
int iswhitespace(char c);

int main(void) {
    char *str = "This is just a test!";
    char **arr_str;
    int i;

    i = 0;
    arr_str = split_whitespaces(str);
    printf("\nOutside the function:\n");
    while (arr_str[i]) {
        printf("arr_str[%d] = %s\n", i, arr_str[i]);
        i++;
    }
    return (0);
}

char **split_whitespaces(char *str) {
    char **arr_str;
    int i;
    int words;
    int w_i;

    i = 0;
    w_i = 0;
    words = word_count(str);
    arr_str = (char **)malloc(words + 1);
    if (!arr_str)
        return (NULL);
    printf("Inside the function:\n");
    while (w_i < words) {
        while (iswhitespace(str[i]) && str[i])
            if (!str[i++])
                break;
        arr_str[w_i] = strdup_w(str, &i);
        printf("arr_str[%d] = %s\n", w_i, arr_str[w_i]);
        w_i++;
    }
    arr_str[words] = 0;
    return (arr_str);
}

char *strdup_w(char *str, int *index) {
    char *word;
    int len;
    int i;

    i = *index;
    len = 0;
    while (str[i] && !iswhitespace(str[i]))
        len++, i++;;
    word = (char *)malloc(len + 1);
    if (!word)
        return (NULL);
    i = 0;
    while (str[*index]) {
        if (!iswhitespace(str[*index])) {
            word[i++] = str[*index];
            (*index)++;
        } else
            break;
    }
    word[len] = '\0';
    return (word);
}

int word_count(char *str) {
    int i;
    int w_count;
    int state;

    i = 0;
    w_count = 0;
    state = 0;
    while (str[i]) {
        if (!iswhitespace(str[i])) {
            if (!state)
                w_count++;
            state = 1;
            i++;
        } else {
            state = 0;
            i++;
        }
    }
    return (w_count);
}

int iswhitespace(char c) {
    if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
        return (1);
    return (0);
}

I'm sorry, if anything this is my first time trying to seek help.

su_privada
  • 13
  • 3
  • 2
    Recommendation: Don't tag C++ for C code and vice versa. Sure, there is a lot of cross-over between C and C++ programmers and you attract extra eyes to the question, but not all of the eyes are interested and they get to vote on the question's merits just like the folks who are interested. – user4581301 May 17 '21 at 19:47
  • 1
    In `split_whitespaces`, try changing `arr_str = (char **) malloc(words + 1);` into `arr_str = malloc(sizeof(*arr_str) * (words + 1));` As you have it, `words` is a _count_ and _not_ a byte _length_, so you're _not_ allocating enough space, so you have UB. – Craig Estey May 17 '21 at 19:49
  • @CraigEstey Thank you so much! But watched some tutorials and they said that malloc takes one argument which is the size of the memory to be allocated (in bytes), that's why I allocated memory for 5 bytes! can you please tell my an alternative of using malloc without sizeof() function. I'll appreciate it. – su_privada May 17 '21 at 20:01
  • why do you want to avoid the sizeof function? that doesnt make much sense – lulle2007200 May 17 '21 at 20:17
  • @lulle in the challenge/exercise they told us that the only allowed function is `malloc()`, I guess it's a way to force us to try figure it out on ourselves.. – su_privada May 17 '21 at 20:21
  • Side note: `malloc` and `sizeof` don't mix well. `malloc` provides a pointer to a block of uninitialized and unstructured memory and pointers know nothing about what they point at other than where it is. When you use `sizeof` on a pointer, all you can get back is the size of a pointer, and that's not usually the information you want. – user4581301 May 17 '21 at 21:50
  • @user4581301 But what's the best alternative? – su_privada May 17 '21 at 22:07
  • Not 100% sure if this answers what you are asking, but with `malloc` you have to keep track of the size yourself. – user4581301 May 17 '21 at 22:29

2 Answers2

1

There are multiple problems in the code:

  • the size is incorrect in arr_str = (char **)malloc(words + 1); You must multiply the number of elements by the size of the element:

      arr_str = malloc(sizeof(*arr_str) * (words + 1));
    
  • it is good style to free the array in the main() function after use.

  • the test while (iswhitespace(str[i]) && str[i]) is redundant: if w_count is computed correctly, testing str[i] should not be necessary. You should use strspn() to skip the white space and strcspn() to skip the word characters.

  • if (!str[i++]) break; is completely redundant inside the loop: str[i] has already been tested and is not null.

  • while (str[i] && !iswhitespace(str[i])) len++, i++;; is bad style. Use braces if there is more than a single simple statement in the loop body.

  • the last loop in strdup_w is complicated, you could simply use memcpy(word, str + *index, len); *index += len;

Here is a modified version:

#include <stdio.h>
#include <stdlib.h>

char **split_whitespaces(const char *str);
char *strdup_w(const char *str, int *index);
int word_count(const char *str);
int iswhitespace(char c);

int main(void) {
    const char *str = "This is just a test!";
    char **arr_str;
    int i;

    arr_str = split_whitespaces(str);
    if (arr_str) {
        printf("\nOutside the function:\n");
        i = 0;
        while (arr_str[i]) {
            printf("arr_str[%d] = %s\n", i, arr_str[i]);
            i++;
        }
        while (i --> 0) {
            free(arr_str[i]);
        }
        free(arr_str);
    }
    return 0;
}

char **split_whitespaces(const char *str) {
    char **arr_str;
    int i;
    int words;
    int w_i;

    i = 0;
    w_i = 0;
    words = word_count(str);
    arr_str = malloc(sizeof(*arr_str) * (words + 1));
    if (!arr_str)
        return NULL;
    printf("Inside the function:\n");
    while (w_i < words) {
        while (iswhitespace(str[i]))
            i++;
        arr_str[w_i] = strdup_w(str, &i);
        if (!arr_str[w_i])
            break;
        printf("arr_str[%d] = %s\n", w_i, arr_str[w_i]);
        w_i++;
    }
    arr_str[words] = NULL;
    return arr_str;
}

char *strdup_w(const char *str, int *index) {
    char *word;
    int len;
    int start;
    int i;

    i = *index;
    start = i;
    while (str[i] && !iswhitespace(str[i])) {
        i++;
    }
    *index = i;
    len = i - start;
    word = malloc(len + 1);
    if (!word)
        return NULL;
    i = 0;
    while (i < len) {
        word[i] = str[start + i];
        i++;
    }
    word[i] = '\0';
    return word;
}

int word_count(const char *str) {
    int i;
    int w_count;
    int state;

    i = 0;
    w_count = 0;
    state = 0;
    while (str[i]) {
        if (!iswhitespace(str[i])) {
            if (!state)
                w_count++;
            state = 1;
        } else {
            state = 0;
        }
        i++;
    }
    return w_count;
}

int iswhitespace(char c) {
    return (c == ' ' || c == '\t' || c == '\n' || c == '\r');
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Thank you so much for your help, And I'm sorry cause it seems that I forgot to mention that in the challenge.. the only allowed function is `mallol()`, that's why I avoided using any other function.. watched some tutorials and they said that malloc takes one argument which is the size of the memory to be allocated (in bytes), that's why I allocated memory for 5 bytes!.. And I'll appreciate if u can explain to my why it didn't work! – su_privada May 17 '21 at 20:16
  • @AchrafELKhnissi: Allocating 5 bytes for a string of 4 `char` is OK because in C type `char` has a size of `1` byte by definition. However when allocating the array of `char *`, you must multiply the number of entries by the size of a pointer, `sizeof(char*)` which is usually 4 or 8 bytes depending on the target system. `arr_str` is a pointer to a pointer to `char`, hence `sizeof(*arr_str)` is `sizeof(char *)`. – chqrlie May 17 '21 at 20:34
  • Oh.. now I get it. I can't thank you enough.. I really appreciate your help, I spent the last couple of hours trying to figure out what I did wrong! Thank you so much. – su_privada May 17 '21 at 20:43
  • Are you allowed to use `for` loops instead of `while` loops? It is a good habit to use `for` loops combining index initialization, test and index increment in one location: `for (i = 0; arr_str[i]; i++) { printf("arr_str[%d] = %s\n", i, arr_str[i]); }` – chqrlie May 17 '21 at 20:48
  • No `for` loops. Even `printf()` isn't allowed instead we use the system call function `write()` to print a string with it.. But I used `printf()` because I was just testing my output with it. – su_privada May 17 '21 at 20:55
  • @AchrafELKhnissi: I was afraid you had this constraint too. The *Norminette* is perverse to many regards IMHO: Its declaration alignment style is ludicrous, the use of TABs is counter-productive as the TAB width varies from one system to another (8 in your editor and 4 here), and banning `for` loops is insane. Btw `len++, i++;;` should not be allowed either. – chqrlie May 17 '21 at 21:03
  • Yeah I'm preparing for the next pool, I wanted to start programming for so many years.. started with python & I learned the basics but I found it super hard to do something with it, and I didn't find any clear path.. But a month or so I heard about 42 school and that's when I started preparing for the next pool, and tbh their system of progression is so thoughtful. From d00 i learned so much.. and than to you and Mr CraigEstey i understood yet another thing.. Much love. – su_privada May 17 '21 at 21:18
1

From my top comment ...

In split_whitespaces, try changing:

arr_str = (char **) malloc(words + 1);

into:

arr_str = malloc(sizeof(*arr_str) * (words + 1));

As you have it, words is a count and not a byte length, so you're not allocating enough space, so you have UB.


UPDATE:

But watched some tutorials and they said that malloc takes one argument which is the size of the memory to be allocated (in bytes), that's why I allocated memory for 5 bytes! can you please tell my an alternative of using malloc without sizeof() function. I'll appreciate it. – Achraf EL Khnissi

There's really no clean way to specify this without sizeof.

sizeof is not a function [despite the syntax]. It is a compiler directive. It "returns" the number of bytes occupied by its argument as a compile time constant.

If we have char buf[5];, there are 5 bytes, so sizeof(buf) [or sizeof buf] is 5.

If we have: int buf[5];, there are 5 elements, each of size int which is [typically] 4 bytes, so the total space, in bytes, is sizeof(int) * 5 or 4 * 5 which is 20.

But, int can vary depending on the architecture. On Intel 8086's [circa the 1980's], an int was 2 bytes (i.e. 16 bits). So, the above 4 * 5 would be wrong. It should be 2 * 5.

If we use sizeof(int), then sizeof(int) * 5 works regardless of the architecture.

Similarly, on 32 bit machines, a pointer is [usually] 32 bits. So, sizeof(char *) is 4 [bytes]. On a 64 bit machine, a pointer is 64 bits, which is 8 bytes. So, sizeof(char *) is 8.

Because arr_str is: char **arr_str, we could have done:

arr_str = malloc(sizeof(char *) * (words + 1));

But, if the definition of arr_str ever changed (to (e.g.) struct string *arr_str;), then what we just did would break/fail if we forgot to change the assignment to:

arr_str = malloc(sizeof(struct string) * (words + 1));

So, doing:

arr_str = malloc(sizeof(*arr_str) * (words + 1));

is a preferred idiomatic way to write cleaner code. More statements will adjust automatically without having to find all affected lines of code manually.


UPDATE #2:

You might just add why you removed the (char **) cast :) -- chqrlie

Note that I removed the (char **) cast. See: Do I cast the result of malloc?

This just adds extra/unnecessary "stuff" as the void * return value of malloc can be assigned to any type of pointer.

If we forgot to do: #include <stdlib.h>, there would be no function prototype for malloc, so the compiler would default the return type to int.

Without the cast, the compiler would issue an an error on the statement [which is what we want].

With the cast, this action is masked at compile time [more or less]. On a 64 bit machine, the compiler will use a value that is truncated to 32 bits [because it thinks malloc returns a 32 bit value] instead of the full 64 bit return value of malloc.

This truncation is a "silent killer". What should have been flagged as a compile time error produces a runtime fault (probably segfault or other UB) that is much harder to debug.

Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • Good read! thorough and clear. You might just add why you removed the `(char **)` cast :) – chqrlie May 17 '21 at 20:38
  • I honestly was skeptical on posting it, but now I'm truly glad that I decided to do so! I learned so much. But as Mr @chqrlie said why `(char *)` instead of `(char **)`? – su_privada May 17 '21 at 20:52
  • 1
    @AchrafELKhnissi: I was referring to the `(char **)` cast in `arr_str = (char **) malloc(words + 1);` This cast is not required in C as `void` pointers are implicitly converted to other pointer types upon assignment (both ways). A useful idiom to avoid size miscalculations is to always use the type pointed by the destination pointer as `arr_str = malloc(sizeof(*arr_str) * (words + 1));` and omit the cast. – chqrlie May 17 '21 at 20:59