1

This piece of code was working fine, until I modified it.

int main(int argc, char** argv)
{
    char command[1024];
    gets(command);
    char *delim = " \t\f";
    char **tokens;
    int i=0;
    /* Extracting tokens from command string */
    for(tokens[i] = strtok(command, delim); tokens[i] != NULL; tokens[i] = strtok(NULL, delim))
    {
       i++;
    }
    return 0;
}

And here's the code that crashed with a Segmentation Fault on line for(tokens[i] = strtok(command, delim); tokens[i] != NULL; tokens[i] = strtok(NULL, delim))

Code:

void commandProcess(char command[])
{
    char *delim = " \t\f";
    char **tokens;
    int i=0;
    /* Extracting tokens from command string */
    for(tokens[i] = strtok(command, delim); tokens[i] != NULL; tokens[i] = strtok(NULL, delim))
    {
       i++;
    }
}

int main(int argc, char** argv)
{
    char command[1024];
    gets(command);
    process(command);
    return 0;
}

I know that char command[] decays to a pointer link, and also that strtok() modifies its first parameter and in my case it might be an undefined behaviour. link

So, could anybody please provide me some other way around, so that I can work out with the same function signature and still avoid the problem?

This seems a trivial problem, but I cannot make through it. :\ I even tried this,

void commandProcess(char command1[])
{
    char command[1024];
    int length = strlen(command1);
    strncat(command, command1, length);
    command[length] = '\0';

    char *delim = " \t\f";
    char **tokens;
    int i=0;

    /* Extracting tokens from command string */
    for(tokens[i] = strtok(temp_command, delim); tokens[i] != NULL; tokens[i] = strtok(NULL, delim))
    {
       i++;
    }

but then again crashes on the line

for(tokens[i] = strtok(temp_command, delim); tokens[i] != NULL; tokens[i] = strtok(NULL, delim))

I think that delim & tokens are not the culprits because the program without commandProcess() worked fine with them.

WhiteSword
  • 101
  • 9
  • 1
    `char command[1024];` in a function is a good recipe for stack overflow. – Eugene Sh. Jun 30 '17 at 16:46
  • 3
    First of all never *ever* use `gets`. It's a dangerous function that has been removed from the latest C standard. – Some programmer dude Jun 30 '17 at 16:46
  • 1
    Secondly, the variable `tokens` is a pointer, but *where does it point*? – Some programmer dude Jun 30 '17 at 16:47
  • Thirdly, you have a variable and a function named `command`. Fourthly, you call `process` but you have no function `process` – KevinDTimm Jun 30 '17 at 16:48
  • 2
    Fourthly, don't add language tags that aren't relevant to the language you program in. C and C++ are two *very* different languages. If you program in C use only that tag. – Some programmer dude Jun 30 '17 at 16:48
  • 2
    You didn't set `tokens` to a valid pointer in memory before attempting `tokens[i]`. If you declare `char **tokens` that just declares space for a pointer. You need to set it to a valid address before you can use it. – lurker Jun 30 '17 at 16:48
  • 1
    Are you using C or C++? If you using C++ you should consider [using a C++ approach](https://stackoverflow.com/questions/236129/split-a-string-in-c) – NathanOliver Jun 30 '17 at 16:49
  • Finally, (related to my 3rd/4th) This is clearly not a [mcve] – KevinDTimm Jun 30 '17 at 16:49
  • Okay, thanks for warning about gets, I'll use some alternative. And I guess, the for loop is doing the initialization at first use, so there's no harm in using `tokens` that way. – WhiteSword Jun 30 '17 at 16:49
  • 1
    There's nothing in the shown code that initializes `tokens`. The shown code initializes `tokens[i]`, for various values of `i`. That's not the same thing as initializing `tokens`. You should reread the chapter in your C++ book that explains what pointers are, and how they work. – Sam Varshavchik Jun 30 '17 at 16:51
  • Yes, `tokens[]` needs to have some memory allocated. The fact that your original code worked is pure luck--it's just as broken. – Lee Daniel Crocker Jun 30 '17 at 16:55
  • @SamVarshavchik I agree on the point that I need to revise some C stuff, but could you give any explanation that why the first piece of program worked fine, even when it contained `char **tokens` & I didn't made to point anywhere but the for loop did? – WhiteSword Jun 30 '17 at 16:56
  • @ManjinderSinghHanjra Have you heard about *undefined behavior*? – Eugene Sh. Jun 30 '17 at 16:57
  • @EugeneSh. But I never faced it, until now. – WhiteSword Jun 30 '17 at 16:58
  • I think you have (unless you never programmed C before..). But haven't recognized it. – Eugene Sh. Jun 30 '17 at 17:00
  • @ManjinderSinghHanjra when you fail to initialize a pointer properly and attempt to use it, the value it happens to contain, taken as an address, may or may not haphazardly point to an area of memory that you are allowed to access. So it *might* happen to work without error. Such problems can be hard to find since the program may appear to work normally and will, perhaps on occasion, fail randomly. That's part of what *undefined behavior* is about. – lurker Jun 30 '17 at 17:00
  • Your first piece of program worked because a bug does not always result in an immediate crash. "Undefined behavior" means anything can happen: the program produces correct results; the program runs, but produces wrong results; the program crashes; the sun explodes in a fiery supernova; etc... – Sam Varshavchik Jun 30 '17 at 17:01
  • I always make every pointer declaration point to something. But this time, when I wrote the first program, I thought may be C could work that way as well. This was the reason, I kept using the same `char **token` unitialized at first. This is known as learning while implementing things. I might never come to know, unless I didn't ask from you people. And yes, thanks for reminding me again that `a bug don't always result in immediate crash`. Thank you all. :) – WhiteSword Jun 30 '17 at 17:08

2 Answers2

1

You write to tokens[i], but tokens is never initialized. Dereferencing an uninitialized pointer invokes undefined behavior.

This means the behavior of the code is unpredictable. It may crash, it may show strange results, or it may appear to work correctly. Also, making a seemingly unrelated code change can change how undefined behavior manifests. This is what happened in your case.

To fix this, you need to allocate space to tokens. In this case, the simplest thing to do is make a fixed size array of pointers:

char *tokens[1024];

You can also malloc the memory so you don't have a large array on the stack:

char **tokens = malloc(sizeof(char *) * 1024);

Since command is of length 1024, the number of tokens shouldn't be any larger.

dbush
  • 205,898
  • 23
  • 218
  • 273
1

As other comments and answers have pointed out, you need to allocate space for the token pointers you collect.

Here's a fixed version, which also uses fgets() instead of the obsolete and dangerous gets(). Normally, unfortunately, fgets is a bit of a nuisance, because it leaves the trailing \n in the buffer. Here, however, that's trivial to account for, as we can simply add '\n' to the delim variable telling strtok what to split on.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define MAXTOKS 100

void commandProcess(char command[])
{
    char *delim = " \t\f\r\n";
    char *tokens[MAXTOKS+1];
    int i=0;
    /* Extracting tokens from command string */
    for(tokens[i] = strtok(command, delim); tokens[i] != NULL; tokens[i] = strtok(NULL, delim))
    {
       i++;
       if(i > MAXTOKS)
       {
          fprintf(stderr, "too many tokens\n");
          exit(1);   
       }
    }
    for(i = 0; tokens[i] != NULL; i++) printf("%d: \"%s\"\n", i, tokens[i]);
}

int main(int argc, char** argv)
{
    char command[1024];
    fgets(command, sizeof(command), stdin);
    commandProcess(command);
    return 0;
}
Steve Summit
  • 45,437
  • 7
  • 70
  • 103