2

I'm attempting to write a function in C which accepts a pointer to contiguous characters ending in '\0' - that is, a string - and a single constant character delimiter, and then outputs a pointer to contiguous pointers, each of which points to a new string. These new strings correspond to the input string broken at each delimiter character and then properly terminated. In fewer words, I want to dynamically build an array of string.

To do this I plan to use malloc() to allocate the memory I need as I go. The "parent array" will be sizeof(char *) * (count + 2) bytes long, to accommodate a pointer to the first character of each delimited substring, plus a terminator. Likewise, each "child array" will be sizeof(char) * (j + 1) bytes long to accommodate all the characters of each substring, again plus a terminator.

My code so far is this.

#include <stdio.h>
#include <stdlib.h>

char *split(char *string, const char delimiter);

int main(int argc, char *argv[]) {
    char *x = split(argv[1], '.');
    while (*x) {
        printf("%d\n", *x);
    }
    return 0;
}

char *split(char *string, const char delimiter) {
    int length, count, i, j = 0;
    while(*(string++)) {
        if (*string == delimiter) count++;
        length++;
    }
    string -= length;
    char *array = (char *)malloc(sizeof(char *) * (length + 1));
    for(i, j = 0; i < (count + 1); i++) {
        while(*(string++) != delimiter) j++;
        string -= j;
        *array = (char *)malloc(sizeof(char) * (j + 1));
        while(*(string++) != delimiter) *(*array++) = *(string++);
        **array = '\0';
        string++;
        array += sizeof(char *);
    }
    *array = '\0';
    array -= (sizeof(char *) * (length + 1));
    return array;  
}

My question is why is the compiler spitting out the following errors?

split2.c: In function ‘split’:
split2.c:25: warning: assignment makes integer from pointer without a cast
split2.c:26: error: invalid type argument of ‘unary *’ (have ‘int’)
split2.c:27: error: invalid type argument of ‘unary *’ (have ‘int’)

My guess is that when the memory for the "parent array" is allocated, the compiler expects that int values, not char * will be stored there. If this is the case, how do I properly correct my code?

I am aware there are far easier ways to do this sort of thing using string.h; my motivation for writing this code is to learn better how pointers work in C.

Many thanks in advance!

snoopy91
  • 337
  • 3
  • 12

3 Answers3

5

I think you want array as a double pointer, char **array.

char **array = (char **)malloc(sizeof(char *) * (length + 1));

As your logic says, you want an array of char*, each one pointing to a string. and so array should be double pointer. If you do this modification, change the return type also, to char**.

If you would like to use double pointers, try this:

char **split(char *string, const char delimiter) {
    int length = 0, count = 0, i = 0, j = 0;
    while(*(string++)) {
        if (*string == delimiter) count++;
        length++;
    }
    string -= (length + 1); // string was incremented one more than length
    char **array = (char **)malloc(sizeof(char *) * (length + 1));
    char ** base = array;
    for(i = 0; i < (count + 1); i++) {
        j = 0;
        while(string[j] != delimiter) j++;
        j++;
        *array = (char *)malloc(sizeof(char) * j);
        memcpy(*array, string, (j-1));
        (*array)[j-1] = '\0';
        string += j;
        array++;
    }
    *array = '\0';
    return base;  
}

Free this array later, like:

i = 0;
while(base[i]) {
    free(base[i]);
    i++;
}
free(base);
base = NULL;
raj raj
  • 1,932
  • 1
  • 14
  • 15
  • A `free(base);` should always be followed by `base = NULL;` to avoid a dangling pointer. People usually write a macro function to do these together. – legends2k Jun 12 '13 at 10:37
  • @raj this is a very useful answer; I want to study the code a little before accepting it. About your reasoning that array should be a `char**`: when array is first initialised to hold the return from malloc() it is merely a pointer to a block of contiguous memory. It only becomes a pointer to a pointer after I decide to store some `char*`s in this block later on. How do I reconcile this with whether the type should be a single or double pointer? That is, it starts off life as a single pointer, and then becomes a double pointer later? What if I stored a pointer in the first byte of the block... – snoopy91 Jun 12 '13 at 16:50
  • ... and then an `int`, for example, in the next. Then surely the "pointer to a pointer" argument would be void? What is the correct way to think about this? Thanks! – snoopy91 Jun 12 '13 at 16:51
  • Sorry for late reply. Every pointer you allocate points to a contiguous block of memory. The datatype of the pointer that you declare decides as what type that block of memory should be accessed. Here, `array` is declared as `char**` and is allocated a block of `char*`. That means, this `array` is the base address of an array of `char*`. That is what a double pointer is. A pointer to another pointer. So, here, conceptually, `array` starts off as a double pointer itself. – raj raj Jun 13 '13 at 04:15
  • Yes, you may store an `int` value in the pointer array (safe, yet not recommended, as far as size of pointer equals size of `int`), which will be treated as an address value thereafter, similar to typecasting `int` to `float`. But problems arise if you try to `free` that address which is merely an `int` value and probably not a valid address. – raj raj Jun 13 '13 at 04:16
  • @rajraj Thank you - that's exactly what I wanted to know. – snoopy91 Jun 13 '13 at 07:44
  • `malloc` for `**array` shouldn't be `(length + 1)` but `(count + 1)` instead – Valen Apr 10 '19 at 14:11
2
    *array = (char *)malloc(sizeof(char) * (j + 1));

should be

    array = (char *)malloc(sizeof(char) * (j + 1));  // malloc returns a pointer, no need to dereference here

and then this

    while(*(string++) != delimiter) *(*array++) = *(string++);

should be

    while(*(string++) != delimiter) *array++ = *(string++); // dereferenceing once would do

and finally this

    **array = '\0';

should be

    *array = '\0'; // same as above

The reason for all the above changes are the same. array is a pointer and not a pointer to a pointer.

Additionally, in your code, the loop index i has never been initialized and hence is bound to lead to non-deterministic behaviour. Either initialize it in declaration like

int length, count, i = 0, j = 0;

or in the loop initialization like

for(i = 0, j = 0; i < (count + 1); i++) {

Hope this helps!

legends2k
  • 31,634
  • 25
  • 118
  • 222
  • 1
    there is nothing wrong with your answer but you should avoid typecasting void*.http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc – Dayal rai Jun 12 '13 at 09:06
  • 1
    @Dayalrai: Nice:) I use `C++` these days and didn't see this post, but nevertheless it's good to know, thanks for the link! – legends2k Jun 12 '13 at 09:08
  • @legends2k: How can the end of split strings be found? – raj raj Jun 12 '13 at 09:36
  • @raj `How can the end of split strings be found?`, good point, I would leave a null pointer at the end – David Ranieri Jun 12 '13 at 10:01
  • @DavidRF: `'\0'` equals `NULL`, will that idea of null pointer be good? – raj raj Jun 12 '13 at 10:02
  • @raj, keep in mind that you must iterate through the pointers not through the truncated string – David Ranieri Jun 12 '13 at 10:04
  • but in this case he is counting the number of delimiters, he know when to stop – David Ranieri Jun 12 '13 at 10:06
  • @rajraj: Like DavidRF says, people usually have a `'\0'` null character to denote the end of a string; so the same semantic would apply to the split strings too. – legends2k Jun 12 '13 at 10:39
  • @legends2k, David RF: I agree with you. i think value of '\0' and NULL is zero, right? What if there comes a case where one delimiter come just after another? – raj raj Jun 12 '13 at 10:54
  • @rajraj: In such case his algorithm would be buggy since it doesn't check the distance between two delimiters. Null Character: http://en.wikipedia.org/wiki/Null_character – legends2k Jun 12 '13 at 11:05
  • @legends2k: yes, continuous occurrence of delimiters should be detected and no empty strings should be allocated in that case. – raj raj Jun 12 '13 at 11:12
0
char *array = (char *)malloc(sizeof(char *) * (length + 1));

should be

char **array = (char **)malloc(sizeof(char **) * (length + 1));

and

*array = (char *)malloc(sizeof(char) * (j + 1));

should be

array[i] = (char *)malloc(sizeof(char) * (j + 1));

You seems to be a beginner, I suggest you to prefer array[i] than use *array or other pointer manipulation, this is more simple at the beginning.

Shar
  • 455
  • 2
  • 14