1

Using Kubuntu 22.04 LTS, Kate v22.04.3, and gcc v11.3.0, I have developed a small program to investigate the use of strtok() for tokenising strings, which is shown below.

#include <stdio.h>
#include <string.h>

int main(void)
{
   char inString[] = "";         // string read in from keyboard.
   char * token    = "";         // A word (token) from the input string.
   char delimiters[] = " ,";     // Items that separate words (tokens).
   
   // explain nature of program.
   printf("This program reads in a string from the keyboard"
          "\nand breaks it into separate words (tokens) which"
          "\nare then output one token per line.\n");
   printf("\nEnter a string: ");
   scanf("%s", inString);
   
   /* get the first token */
   token = strtok(inString, delimiters);
   
   /* Walk through other tokens. */
   while (token != NULL)
   {
      printf("%s", token);
      printf("\n");
      
      // Get next token.
      token = strtok(NULL, delimiters);
   }
   return 0;
}

From the various web pages that I have viewed, it would seem that I have formatted the strtok() function call correctly. On the first run, the program produces the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and breaks it into separate words (tokens) which
are then output one token per line.

Enter a string: fred ,  steve ,   nick
f
ed

On the second run, it produced the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and brakes it into separate words (tokens) which
are then output one token per line.

Enter a string: steve ,  barney ,   nick
s
eve
*** stack smashing detected ***: terminated
Aborted (core dumped)

Subsequent runs showed that the program sort of ran, as in the first case above, if the first word/token contained only four characters. However, if the first word/token contained five or more characters then stack smashing occurred.

Given that "char *" is used to access the tokens, why :-

a) is the first token (in each case) split at the second character ?

b) are the subsequent tokens (in each case) not output ?

c) does a first word/token of greater than four characters cause stack smashing?

Stuart

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
Stuart
  • 121
  • 7
  • 5
    `char inString[] = "";` is causing your stack overflow. You haven't specified the length of the string so it can't hold any string longer than 0. – Marco Apr 19 '23 at 13:29
  • 2
    `scanf("%s", inString);` will not input strings with spaces. – n. m. could be an AI Apr 19 '23 at 13:36
  • 1
    OT: `char * token = "";` is odd, it should rather be `char * token = NULL;`, but here you don't need to initialize it at all. The most idiomatic way would be remove `char * token = "";` alltogether and have just `char *token = strtok(inString, delimiters);` – Jabberwocky Apr 19 '23 at 13:37
  • 2
    @Marco: A [stack overflow](https://en.wikipedia.org/wiki/Stack_overflow) is something very different than a [buffer overflow](https://en.wikipedia.org/wiki/Buffer_overflow) on the stack. You are incorrectly using the term "stack overflow" for the latter. – Andreas Wenzel Apr 19 '23 at 13:42
  • 1
    @Stuart When you wrote `char inString[]`, did you think that meant, "allocate `inString` as a string that's as big as it needs to be for whatever gets read into it?" C doesn't have strings that work that way. – Steve Summit Apr 19 '23 at 14:05
  • @AndreasWenzel I am not. OP said he got the following message: `*** stack smashing detected ***:` which means the stack got corrupted. – Marco Apr 19 '23 at 14:11
  • @AndreasWenzel If you google `stack smashing` the first result is `A cyberattack that causes a stack buffer overflow. `. – Marco Apr 19 '23 at 14:12
  • 1
    @Marco: In your first comment, you used the term "stack overflow", not "stack buffer overflow". A [stack overflow](https://en.wikipedia.org/wiki/Stack_overflow) is something very different than a [stack buffer overflow](https://en.wikipedia.org/wiki/Stack_buffer_overflow). You are confusing these two terms. – Andreas Wenzel Apr 19 '23 at 14:26
  • @AndreasWenzel You are incorrect. See [Stack overflow (disambiguation)](https://en.wikipedia.org/wiki/Stack_overflow_(disambiguation)): Stack overflow may also refer to: _Stack buffer overflow, when a program writes to a memory address on the program's call stack outside of the intended data structure; usually a fixed length buffer_. – Marco Apr 19 '23 at 15:22
  • @Marco: The purpose of the disambiguation pages of Wikpedia is to help you find the article that you are looking for. Therefore, when such as page states that term A "may refer" to term B, it is not necessarily correct to infer that term A is an appropriate way of referring to term B. For example, term A could also be a common incorrect way of referring to Term B. I suggest that you read the articles themselves to determine which terms are appropriate. – Andreas Wenzel Apr 19 '23 at 18:45
  • @AndreasWenzel I disagree. _Disambiguation is the process of identifying which meaning of a word is used in context._ In **this** context, it should be pretty clear that "stack buffer overflow" was meant. If enough people refer to a stack buffer overflow by "stack overflow" [in context] then you'll have to accept that. You might be technically correct, but again, that doesn't matter. The only point you could make is that my statement was ambiguous, but not wrong. It's not "wrong" to refer to a stack buffer overflow by stack overflow, otherwise the wikipedia disambiguation wouldn't exist. – Marco Apr 19 '23 at 19:26
  • @AndreasWenzel However, I'm thankful you pointed out the difference. I'm aware that a _call stack_ overflow is something different than a buffer overflow on the stack. I wasn't aware people would care **that much** about the difference. – Marco Apr 19 '23 at 19:28
  • The only thing that matters is that I _referred_ to a stack buffer overflow by "stack overflow" which is correct. – Marco Apr 19 '23 at 19:30
  • @Marco:You are correct. I do know that it is permissible not to set an array size if an actual string is provided to initialise the array. I guess that my thinking was not clear. – Stuart Apr 22 '23 at 09:12
  • @n.m.: Thank you for that. I was not aware of that. – Stuart Apr 22 '23 at 09:13
  • @Jabberwocky: You are correct in that the pointer should have been initialized with NULL. Since when is a pointer ever(!) initialized with a string ?!! Sheesh. As to your suggestion of initializing the pointer with a call to strtok(), that does seem like a better idea. With your permission, I will use that. – Stuart Apr 22 '23 at 09:16
  • @Andreas Wenzel: If you re-read my original posting, you will see that I never mentioned either stack overflow or buffer overflow. It was the system that reported "*** stack smashing detected ***" which I then referred to in the following paragraph. – Stuart Apr 22 '23 at 09:24
  • @Steve Summit: Not quite. I thought that I was setting aside space for a zero length string which would then be expanded according to what was captured by scanf(). Yeah, I am learning that C strings definitely do not act that way. Ho hum. I guess that I was a bit tired/not thinking clearly when I coded that bit. – Stuart Apr 22 '23 at 09:30
  • @Stuart: Yes, you are correct that you never used the term "stack overflow". If you re-read my comments, you will see that they were directed at the user named "Marco" who did use the (incorrect) term "stack overflow" in their first comment to your question. – Andreas Wenzel Apr 22 '23 at 12:50
  • Please accept my apologies for forgetting who your comment was directed to. With over a dozen replies to read through, I guess that I got a bit confused. – Stuart Apr 24 '23 at 07:22

2 Answers2

4

The declaration

char inString[] = "";

is equivalent to:

char inString[1] = "";

This means that you are allocating an array of only a single element, so it only has space for storing a single character.

The function call

scanf("%s", inString);

requires that the function argument inString points to a memory buffer that is sufficiently large to store the matched input. Your program is violating this requirement, as the memory buffer has only space for a single character (the terminating null character). It can therefore only store strings with a length of zero.

By violating the requirement, your program is invoking undefined behavior, which means that anything can happen, including the strange behavior that you observed. The function scanf is probably overflowing the buffer inString, overwriting other important data on your program's stack, causing it to misbehave. This is called "stack smashing".

To fix this, you should give the array inString more space, for example by changing the line

char inString[] = "";

to:

char inString[200] = "";

However, in that case, if the user enters more than 200 characters of input as a single word, then you will have the same problem again and your program may crash. Therefore, you may want to additionally limit the number of characters matched by scanf to 199 characters (200 including the terminating null character). That way, you can ensure that the user will not be able to crash your program.

You can add such a limit like this:

scanf("%199s", inString);

Note, however, that the %s specifier will only match a single word. If you want to read an entire line of input, you may want to use the function fgets instead of scanf.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • Thank you very much for the obviously much needed clarification. I think that I will(!) use fgets() instead of scanf(). You say that "...the %s specifier will only match a single word." That being true, what would be the sense of using a "big number" in the "%s" if words after the first space character are ignored by scanf() ? – Stuart Apr 22 '23 at 09:46
  • @Stuart: You are correct that under most circumstances, the user will probably never enter a word longer than 200 characters, so using `%s` instead of `%199s` will probably be sufficient. However, if you are programming for example a web server, then a malicious user will be able to crash your server program by entering a word longer than 200 characters. You can prevent the user from being able to do that by using `%199s` instead of `%s`. Therefore, it is up to you to decide whether you want to prevent the user from being able to crash your program, or whether it doesn't matter. – Andreas Wenzel Apr 22 '23 at 13:00
  • @Stuart: If one of the answers solved your problem, then you may want to consider accepting one of the answers. See this official help page for more information: [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers) – Andreas Wenzel Apr 22 '23 at 13:30
  • @Stuart: If you do decide to use `fgets`, then you will likely run into the problem of `fgets` storing the `\n` character at the end of the line into the string. Therefore, you may want to read this on how to remove it: [Removing trailing newline character from fgets() input](https://stackoverflow.com/q/2693776/12149471) – Andreas Wenzel Apr 22 '23 at 13:36
3

This declaration of a character array

char inString[] = "";   

is equivalent to

char inString[1] = { '\0' };; 

That is it declares an array with only one element that is able to store only an empty string. So any attempt to read a string in this character array using this call of scanf

scanf("%s", inString);

invokes undefined behavior.

You need to specify the number of elements much more greater. For example

enum { N = 100 };
char inString[N] = "";   

This initialization of a pointer

char * token    = "";

does not make a great sense. It is better to write for example

char * token = NULL;

This call of scanf

scanf("%s", inString);

can read only one word that is a sequence of characters separated by white space characters.

Instead write for example

scanf( " %99[^\n]", inString);

It makes sense to include the tab character '\t' in the list of delimiters

const char *delimiters = " \t,";

Instead of these calls of printf

  printf("%s", token);
  printf("\n");

it will be simpler to write

puts( token );
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • Thank you very much for the clarification and your suggestions. Would `scanf( "%99[^\n]", inString)` capture a string that included spaces ? I have just learned that scan() captures characters up to but not including the first white space character. – Stuart Apr 22 '23 at 09:54
  • 1
    @Stuart This call of scanf reads all characters including spaces until the new line character is encountered. . Also it will be safer to write scanf( " %99[^\n]", inString); Pay attention to the leading space in the format string. It allows to skip white space characters leaved in the buffer after entering other data as for example if before this call there is a call like scanf( "%d", &number ); to read an integer. – Vlad from Moscow Apr 22 '23 at 10:12
  • @Stuart Because after this call the new line character '\n' will be leaved in the buffer and the call of scanf that reads a string will read an empty string because at once it meets the new line character. – Vlad from Moscow Apr 22 '23 at 10:12
  • Thank you for your further clarifications. They are very helpful. – Stuart Apr 22 '23 at 10:37