1

I'm ultimately trying to code a shell, so I need to be able to parse commands. I'm trying to convert every word and special symbol into tokens, while ignoring whitespaces. It works when characters separating tokens are | & < > however as soon as I use a single whitespace character, the program crashes. Why is that?

I'm a student, and I realize the way I came up with separating the tokens is rather crude. My apologies.

#include <iostream>
#include <stdio.h>
#include <string>
#include <cctype>

using namespace std;

#define MAX_TOKENS 10

int main()
{
    //input command for shell
    string str;

    string tokens[MAX_TOKENS] = { "0", "0", "0", "0", "0", "0", "0", "0", "0", "0" };
    int token_index = 0;
    int start_token = 0;

    cout << "Welcome to the Shell: Please enter valid command: " << endl << endl;
    cin >> str;


    for (unsigned int index = 0; index < str.length(); index++)
    {
        //if were at end of the string, store the last token
        if (index == (str.length() - 1)) tokens[token_index++] = str.substr(start_token, index - start_token + 1);

        //if char is a whitespace store the token
        else if (isspace(str.at(index)) && (index - start_token > 0))
        {
            tokens[token_index++] = str.substr(start_token, index - start_token);
            start_token = index + 1;
        }

        //if next char is a special char - store the existing token, and save the special char
        else if (str[index] == '|' || str[index] == '<' || str[index] == '>' || str[index] == '&')
        {
            //stores the token before our special character
            if ((index - start_token != 0)) //this if stops special character from storing twice
            {
                //stores word before Special character
                tokens[token_index++] = str.substr(start_token, index - start_token);
            }

            //stores the current special character
            tokens[token_index++] = str[index];

            if (isspace(str.at(index + 1))) start_token = index + 2;
            else start_token = index + 1;
        }
    }

    cout << endl << "Your tokens are: " << endl;

    for (int i = 0; i < token_index; i++)
    {
        cout << i << " = " << tokens[i] << endl;
    }



    return 0;
}
Nik
  • 11
  • 1
  • You want to take look at that: http://ericlippert.com/2014/03/05/how-to-debug-small-programs/ – Samuel Mar 07 '14 at 07:59
  • I think the problem is in `cin` input with spaces. You may have to use `cin.getline()`.Have a look at http://stackoverflow.com/questions/5838711/c-cin-input-with-spaces – Sakthi Kumar Mar 07 '14 at 08:01
  • My apologies, I will go find a rubber duck – Nik Mar 07 '14 at 08:10

1 Answers1

1

A few things:

  • Check that token_index is less than MAX_TOKENS before using it again after each increment, otherwise you have a buffer overflow. If you change tokens to be a std::vector then you can use the at() syntax as a safety net for that.
  • The expression index - start_token has type unsigned int so it can never be less than 0. Instead you should be doing index > start_token as your test.
  • str.at(index) throws an exception if index is out of range. However you never catch the exception; depending on your compiler, this may just look like the program crashing. Wrap main()'s code in a try...catch(std::exception &) block.

Finally, this is a long shot but I will mention it for completeness. Originally in C89, isspace and the other is functions had to take a non-negative argument. They were designed so that the compiler could implement them via an array lookup, so passing in a signed char with a negative value would cause undefined behaviour. I'm not entirely sure whether or not this was "fixed" in the various later versions of C and C++, but even if standards mandate it , it's possible you have a compiler that still doesn't like receiving negative chars. To eliminate this as a possibility from your code, use isspace( (unsigned char)str.at(index) ), or even better, use the C++ locale interface.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • as per Sakthi Kumar's comment, I overlooked that you weren't even reading in a line to begin with :) The `>>` operator does formatted I/O; when reading into a string it just reads up to the first whitespace. But your code still should not crrash in this case. – M.M Mar 07 '14 at 08:18