0

I am trying to write a function that will split a string along a space (' ') without changing the original string, put all of the tokens into an array, then return that array. The problem I am running into is in returning the pointer. Below is my code.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **split_line(char *ln) {
    char **tokens, *tok, line[256];
    int j;

    strcpy(line, ln);

    tokens = calloc(64, sizeof(char*));
    for (int i = 0; i < 64; i++)
        tokens[i] = calloc(64, sizeof(char));

    tokens[0] = strtok(line, " ");
    for (j = 1; (tok = strtok(NULL, " ")) != NULL && j < 63; j++) 
        tokens[j] = tok;
    tokens[j] = NULL;

    return tokens;
}

int main(void) {
    char **cut_ln, *input;

    input = "Each word in this string should be an element in cut_ln.";
    cut_ln = split_line(input);

    printf("`%s`\n", input);
    for (int i = 0; cut_ln[i] != NULL; i++)
        printf("[%d]: `%s`\n", i, cut_ln[i]);

    return 0;
}

When run, this gives:

`This word in this string should be an element in cut_ln.`
[0]: `This`
[1]: `wo1`
[2]: `�={G+s`
[3]: `G+s`
[4]: `string`
[5]: ``
[6]: `0����`
[7]: `��`
[8]: ``
[9]: ``
[10]: ``

When I try to print the contents of tokens in the split_line function, it gives the expected result. However, when tokens is returned and assigned to a variable, and then printed, it gives the result above as demonstrated. What am I doing wrong?

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
nswerhun
  • 144
  • 8
  • 3
    `tokens[j] = tok;` overwrites the pointer you so carefully allocated with `calloc`, leaking that memory. Did you intend to write something like `strcpy(tokens[j], tok)` instead? – Nate Eldredge Feb 02 '21 at 23:09
  • `strdup()` then `strok()`. Don't re-invent the wheel. If you do reinvent it, at least **count the number of entries first** in a quick scan, then allocate, then return. Don't just **assume** 64 will be enough. That will almost always be wrong. The number `64` has absolutely no reason to be here, you have nothing to back up why that number exists. It seems chosen arbitrarily. – tadman Feb 02 '21 at 23:13
  • 1
    I commend you for writing this function -- it's a great way to handle string input. Wonderful to have in your toolkit. (I wish there were a standard library function like this.) See also [this question](https://stackoverflow.com/questions/49372173/why-no-split-function-in-c). – Steve Summit Feb 02 '21 at 23:13
  • @tadman Did not know about `strdup`. However, [it would appear](https://stackoverflow.com/a/252802/13821134) that it is not part of ISO C. – nswerhun Feb 02 '21 at 23:31
  • Unless you're specifically constrained by ISO C, as in you're on some limited platform, then sure, but otherwise, some degree of POSIX compliance virtually is a given, even on non-POSIX compliant platforms like Windows. When we talk about modern C we usually include POSIX as it's readily available just about everywhere. GCC, clang and VCC all support it. Some arcane C89 compiler for a long dead operating system might not, like I wouldn't count on it being present in Microsoft XENIX. – tadman Feb 02 '21 at 23:33

2 Answers2

3

When you return tokens, it contains the pointers returned from strtok, which are pointers into line. But line no longer exists at this point.

You allocated memory and made the various elements of tokens point to that allocated memory. Don't overwrite those values with the values returned from strtok.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • `line` fell out of scope, to be specific. It's just a ghost of a previous value at that point, and like a haunted string it does all sorts of spooky things. – tadman Feb 02 '21 at 23:16
0

For starters the function should be declared at least like

char ** split_line( const char *ln );

because the passed string is not being changed within the function.

The function will be a more flexible if to declare a second parameter that will specify delimiters.

char ** split_line( const char *ln, const char *delim );

Secondly it is a bad idea to use magic numbers like 64 or 256. The function will not work if the passed string contains more than 63 tokens or when the string for example contains at least one token the length of which is greater than 63.

You dynamically allocated 64 arrays in this loop

for (int i = 0; i < 64; i++)
    tokens[i] = calloc(64, sizeof(char));

and assigned their addresses to elements of the array pointed to by the variable tokens. But at once you reassigned the pointers with addresses inside the local array line.

tokens[0] = strtok(line, " ");
for (j = 1; (tok = strtok(NULL, " ")) != NULL && j < 63; j++) 
    tokens[j] = tok;

So the function produces numerous memory leaks. And the returned array of pointers will contain invalid pointers because the local array line will not be alive after exiting the function.

Also this statement

tokens[j] = NULL;

is redundant. Using calloc you already set initially all pointers to NULL.

The function can look the following way as it is shown in the demonstrative program below.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char ** split_line( const char *s, const char *delim )
{
    size_t n = 0;
    
    for ( const char *p = s; *p; )
    {
        p += strspn( p, delim );
        
        if ( *p )
        {
            ++n;
            p += strcspn( p, delim );
        }
    }
    
    char **tokens = calloc( n + 1, sizeof( char * ) );
    
    if ( tokens )
    {
        size_t i = 0;
        int success = 1;
        
        for ( const char *p = s; success && *p; i += success )
        {
            p += strspn( p, delim );
        
            if ( *p )
            {
                const char *q = p;
                p += strcspn( p, delim );
                
                tokens[i] = malloc( p - q + 1 );
                
                if ( ( success = tokens[i] != NULL ) )
                {
                    memcpy( tokens[i], q, p - q );
                    tokens[i][p-q] = '\0';
                }
            }
        }
        
        if ( !success )
        {
            for ( char **p = tokens; *p; ++p )
            {
                free( *p );
            }
            
            free( tokens );
        }
    }
    
    return tokens;
}

int main(void) 
{
    const char *s = "Each word in this string should be an element in cut_ln.";
    
    char **tokens = split_line( s, " " );
    
    if ( tokens )
    {
        for ( char **p = tokens; *p; ++p )
        {
            puts( *p );
        }
        
        for ( char **p = tokens; *p; ++p )
        {
            free( *p );
        }
    }
    
    free( tokens );

    return 0;
}

The program output is

Each
word
in
this
string
should
be
an
element
in
cut_ln.
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335