1

I have function named ft_split(char const *s, char c) that is supposed to take strings and delimiter char c and divide s into bunch of smaller strings.

It is 3rd or 4th day I am trying to solve it and my approach:

  1. Calculates no. of characters in the string including 1 delimiter at the time (if space is delimiter so if there are 2 or more spaces in a row than it counts one space and not more. Why? That space is a memory for adding '\0' at the end of each splitted string)

  2. It finds size (k) of characters between delimiters -> malloc memory -> copy from string to malloc -> copy from malloc to malloc ->start over.

But well... function shows segmentation fault. Debugger shows that after allocating "big" memory it does not go inside while loop, but straight to big[y][z] = small[z] after what it exits the function.

Any tips appreciated.

#include "libft.h"
#include <stdlib.h>
int ft_count(char const *s, char c)
{
    int i;
    int j;

    i = 0;
    j = 0;
    while (s[i] != '\0')
    {
        i++;
        if (s[i] == c)
        {
            i++;
            while (s[i] == c)
            {
                i++;
                j++;
            }
        }
    }
    return (i - j);
}
char **ft_split(char const *s, char c)
{
    int i;
    int k;
    int y;
    int z;
    char *small;
    char **big;

    i = 0;
    y = 0;
    if (!(big = (char **)malloc((ft_count(s, c) + 1) * sizeof(char))))
        return (0);
    while (s[i] != '\0')
    {
        while (s[i] == c)
            i++;
        k = 0;
        while (s[i] != c)
        {
            i++;
            k++;
        }
        if (!(small = (char *)malloc(k * sizeof(char) + 1)))
            return (0);
        z = 0;
        while (z < k)
        {
            small[z] = s[i - k + z];
            z++;
        }
        small[k] = '\0';
        z = 0;
        while (z < k)
        {
            big[y][z] = small[z];
            z++;
        }
        y++;
        free(small);
    }
    big[y][i] = '\0';
    return (big);
}
int main()
{
    char a[] = "jestemzzbogiemzalfa";
    ft_split(a, 'z');
}
Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
Tony
  • 25
  • 1
  • 6
  • if `big` is a `char **`, or a pointer-to-pointer-to-char, should the size of a pointer be taken into account when you allocate the memory for it? Now you're using `sizeof(char)` when passing the allocation size to `malloc()`. On e.g. an x86-64, a pointer is eight times as large as a `char`... And if `big` points to a bunch of pointers, are you setting those pointers somewhere? – ilkkachu Feb 27 '21 at 21:05
  • @ikkachu this is good point. I am not sure about this mallocs how it exactly works and hope that somebody could explain it wisely. I applied your tip to the program but it still shows segmentation fault. – Tony Feb 27 '21 at 21:10
  • 1
    *Any tips*. Use a debugger. At a minimum it will instantly tell you the exact line that triggers the seg fault. That's the first bit of info that should be collected. Can also use the debugger to step through the code and examine it as it runs to see where and why things start going wrong. – kaylum Feb 27 '21 at 21:12
  • I used debuger and wrote a feedback: function shows segmentation fault. After alocating "big" memory it does not go inside while, but straight to big[y][z] = small[z] after what it exits the function. – Tony Feb 27 '21 at 21:15
  • 1
    *function shows segmentation fault*. Which exact line of code does the seg fault happen on? The debugger gives you that info. – kaylum Feb 27 '21 at 21:19
  • DEBUGGER shows that after alocating "big" memory it does not go inside while loop in ft_split, but straight to big[y][z] = small[z] after what it exits the function. This is everything that I can read from debugger. From here: if (!(big = (char **)malloc((ft_count(s, c) + 1) * sizeof(char)))) it goes here: int ft_count(char const *s, char c) - counts and return the number of characters after that it goes here: big[y][z] = small[z]; And leaves the program. I cannot be more precise about what it does :( @kaylum – Tony Feb 27 '21 at 21:24
  • 2
    Ok, it seems you do not know how to use the debugger fully. Would suggest you learn that as that will save you alot of time. Anyway, the problem is likely with your `big` allocation. That is not the correct way to allocate a 2D array and accessing as such is likely to result in seg faults. See: [Correctly allocating multi-dimensional arrays](https://stackoverflow.com/questions/42094465/correctly-allocating-multi-dimensional-arrays) and [How do I dynamically allocate a 2d array of chars?](https://stackoverflow.com/questions/36354800/how-do-i-dynamically-allocate-a-2d-array-of-chars) – kaylum Feb 27 '21 at 21:31
  • Hey thank you @kaylum. I feel like i already seen that. I ll give it another try. About debugger, is there anything else for gdb? I use commands watch, b, s, n, layout next. – Tony Feb 27 '21 at 21:39
  • 1
    `bt` gives the back trace. Which includes the current line of code that the program has stopped at. – kaylum Feb 27 '21 at 21:40
  • 1
    `p` for "print", e.g. `p y`, `p z` to see the indexes if (when) it crashes on accessing `big[y][z]`. `p big[y]` may also be illuminating. – ilkkachu Feb 27 '21 at 21:46
  • OT: regarding: `if (!(big = (char **)malloc((ft_count(s, c) + 1) * sizeof(char))` 1) in C, the returned type is `void*` which can be assigned to any pointer. Casting just clutters the code (and is error prone). 2) the expression: `sizeof( char )` is defined in the C standard as 1. Multiplying anything by 1 has no effect. Suggest removing that expression – user3629249 Feb 27 '21 at 23:10
  • to learn about `gdb`: from the command line type: `gdb --help` which will output LOTS of info. Then follow the link: `` to learn all the details about `gdb` – user3629249 Feb 27 '21 at 23:14
  • you seem to be trying to implement the functionality of the function: `strsep()`. Unless your instructor directs otherwise, I suggest you use that function – user3629249 Feb 27 '21 at 23:25
  • it's not good to malloc inside the function ft_split then return the pointer。 it will break the rule of (malloc, free)symmetric. memory leak will be easy to happen – Paul Yang Feb 28 '21 at 00:21

2 Answers2

4

I didn't get everything what the code is doing, but:

You have a char **big, it's a pointer-to-pointer-to-char, so presumably is supposed to point to an array of char *, which then point to strings. That would look like this:

[ big (char **) ] ->  [ big[0] (char *)  ][ big[1] (char *) ][ big[2] ... ]
                         |                   [
                         v                   v
                      [ big[0][0] (char) ]  ...
                      [ big[0][1] (char) ]
                      [ big[0][2] (char) ]
                      [ ...              ]

Here, when you call big = malloc(N * sizeof(char *)), you allocate space for the middle pointers, big[0] to big[N-1], the ones on the top right in the horizontal array. It still doesn't set them to anything, and doesn't reserve space for the final strings (big[0][x] etc.)

Instead, you'd need to do something like

big = malloc(N * sizeof(char *));
for (i = 0; i < N; i++) {
    big[i] = malloc(k);
}

for each final string individually, with the correct size etc. Or just allocate a big area in one go, and split it among the final strings.

Now, in your code, it doesn't look like you're ever assigning anything to big[y], so they might be anything, which very likely explains the segfault when referencing big[y][z]. If you used calloc(), you'd now that big[y] was NULL, with malloc() it might be, or might not.


Also, here:

    while (s[i] != '\0')
    {
        while (s[i] == c)
            i++;
        k = 0;
        while (s[i] != c)    /* here */
        {
            i++;
            k++;
        }

I wonder what happens if the end of string is reached at the while (s[i] != c), i.e. if s[i] is '\0' at that point? The loop should probably stop, but it doesn't look like it does.

ilkkachu
  • 6,221
  • 16
  • 30
  • "Or just allocate a big area in one go, and split it among the final strings.". But this is what I am trying to do. Notice that when I do "big" malloc I do not know at that time how many small strings I am going to have. Maybe 0 maybe 20... Why it matters how much memory i assign to each substring? The way I think here is to put enough bythes so all strings summed together will fit. i dont know different way :/ :(. Thank you for answer. I appreciate your effort! – Tony Feb 27 '21 at 21:47
  • @Tony, sorry, I couldn't tell what `ft_count()` was meant to count. But, if `big` is supposed to contain pointers to the individual words, then you need space for _those pointers_. If you allocate space for all the words, you still don't know where each one starts. – ilkkachu Feb 27 '21 at 21:56
  • @Tony, Do I get it right that you want to split `jestemzzbogiemzalfa` so that `big[0]` points to `jestem`, `big[1]` to `bogiem` and so.? – ilkkachu Feb 27 '21 at 21:58
  • @ikkachu after every "small" sring i add '\0', it is not enough? What you mean now is that i allocated all space for big[0][z] and when i come with next word and move to big[1][z] I have nothing there? – Tony Feb 27 '21 at 22:01
  • Yes exactly this is what they ask us to do in the task: Allocates (with malloc(3)) and returns an array of strings obtained by splitting ’s’ using the character ’c’ as a delimiter. The array must be ended by a NULL pointer. – Tony Feb 27 '21 at 22:01
  • @Tony, yep. But if you turn `jestemzzbogiemzalfa\0` into `jestem\0\0bogiem\0alfa\0`, it doesn't help in finding out where the words in the middle start. If `big[1]` is supposed to point into `bogiem` (or `big[2]`, if the empty string counts, doesn't matter), you need to assign that value to the pointer at some point. – ilkkachu Feb 27 '21 at 22:11
  • If you do `int *array = malloc(4 * sizeof(int))`, you have space for four `int`s, but they're not set yet, so it's no use using `int[0]` for anything (before setting it). It's the same for `char **big = malloc(N * sizeof(char *))`, you have space for N `char *`s, but they're not set to anything, you can't access anything through `big[0]`, `big[1]` etc. – ilkkachu Feb 27 '21 at 22:12
  • Than i have no idea how to allocate big malloc while not knowing small mallocs. Its over my knowledge and i even dont know where to look for it. And those are just the beginnig of our tasks... im so frustrated. Thank you for answers, I appreciate that a lot!! – Tony Feb 27 '21 at 22:17
  • @Tony, consider something like `const char *str = "foo:bar"; char *p, *q; p = &str[0]; /* == str */ q = &str[4]; /* == str + 4 */` Now `p` and `q` point to the start of `foo` and `bar` (but the `:` is still there). The same with an array of `char *` instead: `char **words = malloc(2 * sizeof(char *)); words[0] = &str[0]; words[1] = &str[4];` Same difference, you need some variable to point inside `str`. Hard to explain without pictures, though, sorry. But there probably are a number of tutorials on pointers and C strings online (trouble is just finding a good one) – ilkkachu Feb 27 '21 at 22:24
  • Hey I was doing day off yesterday I was too tired. A friend explained me what was the problem. It was this part: big[y][z] = small[z]; right now the function looks like this: big[y] = small; y++; } maybe you know what was wrong with my system? He said i was not alocating memory for big[y] in big[y][z]. Otherwise it was correct except first malloc being char* and that I allocated too much of memory there. – Tony Mar 01 '21 at 13:09
  • @Tony, I'm not sure I know how to explain it any better, at least not without access to a blackboard. – ilkkachu Mar 01 '21 at 16:45
1

There are multiple problems in the code:

  • the ft_count() function is incorrect: you increment i before testing for separators, hence the number is incorrect if the string starts with separators. You should instead count the number of transitions from separator to non-separator:
int ft_count(char const *s, char c)
{
    char last;
    int i;
    int j;

    last = c;
    i = 0;
    j = 0;
    while (s[i] != '\0')
    {
        if (last == c && s[i] != c)
        {
            j++;
        }
        last = s[i];
        i++;
    }
    return j;
}

Furthermore, the ft_split() functions is incorrect too:

  • the amount of memory allocated for the big array of pointers in invalid: you should multiply the number of elements by the element size, which is not char but char *.
  • you add an empty string at the end of the array if the string ends with separators. You should test for a null byte after skipping the separators.
  • you do not test for the null terminator when scanning for the separator after the item.
  • you do not store the small pointer into the big array of pointers. Instead of copying the string to big[y][...], you should just set big[y] = small and not free(small).

Here is a modified version:

char **ft_split(char const *s, char c)
{
    int i;
    int k;
    int y;
    int z;
    char *small;
    char **big;

    if (!(big = (char **)malloc((ft_count(s, c) + 1) * sizeof(*big))))
        return (0);
    i = 0;
    y = 0;
    while (42)  // aka 42 for ever :)
    {
        while (s[i] == c)
            i++;
        if (s[i] == '\0')
            break;
        k = 0;
        while (s[i + k] != '\0' && s[i + k] != c)
        {
            k++;
        }
        if (!(small = (char *)malloc((k + 1) * sizeof(char))))
            return (0);
        z = 0;
        while (z < k)
        {
            small[z] = s[i];
            z++;
            i++;
        }
        small[k] = '\0';
        big[y] = small;
        y++;
    }
    big[y] = NULL;
    return (big);
}

42 rant:

Ces conventions de codage (la norminette) sont contre-productives! Les boucles for sont plus lisibles et plus sûres que ces while, les casts sur les valeurs de retour de malloc() sont inutiles et confusantes, les parenthèses autour de l'argument de return sont infantiles.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Ok a friend helped me with that also, but i did not understand why big[y] = small directly and why not free after each memory passing? Isnt it that we are overwriting the same small malloc over and over? I see it this way: small(2) = bogiem. small(3) = alfa. So if i write alfa on top of bogiem i ll get alfaem. I think here is my confusion. Also how big malloc knows how much memory in total it needs if i only create 3 submallocs inside? – Tony Mar 02 '21 at 14:25
  • @Tony: `small = malloc()` allocates a new block of memory and stores its address into `small`, which is a pointer. Storing this pointer into the array just copies the value of the pointer to the array element. The next iteration will allocate another block, etc. The previous block is not overwritten, `small` now points to a new block. – chqrlie Mar 07 '21 at 14:44