1

I'm trying to break up a string into "symbols" with C++ for further work. I haven't written anything in C++ for a long while, so forgive me if there is something inherently wrong with this code.

The purpose of the symbolize() function below is to break up a string, such as "5+5", into a vector of strings, eg {"5","+","5"}. It's not working. If you think the code is too messy, please suggest a way to simplify it.

Here's my code so far:

#include <iostream>
#include <string>
#include <vector>
#include <ctype.h>
#include <sstream>

using namespace std;

vector<string> symbolize(string);

int main(int argc, const char * argv[])
{

    string input;
    cin >> input;

    vector<string> symbols;

    symbols = symbolize(input);

    for(int i=0;i<symbols.size();i++){
        cout<<symbols.at(i) << endl;
    }

    return 0;
}


vector<string> symbolize(string input){
    int position = 0;
    char c;
    stringstream s;
    vector<string> symbols;
    enum symbolType {TEXT,OPERATOR}symbolType,charType;

    while(position < input.size()){
        c = input.at(position);
        if(isalnum(c))symbolType = TEXT;
        else symbolType = OPERATOR;
        charType = symbolType;

        while(symbolType == charType){
            s << c;
            position++;
            if(position>=input.length())break;
            c = input.at(position);
            if(isalnum(c)) charType = TEXT;
            else charType = OPERATOR;
        }

        symbols.push_back(s.str());
        s.clear();
    }

    return symbols;
}

Thanks for taking a look.

Edit: BTW, I should mention that the function returns the fist "token", eg "5+5" -> "5"

Edit2: I was mistaken. I just tried "5+5", and it returned {"5","5+","5+5"}. However, it only returns the first before a space. Sorry for the confusion!

Edit3: Thank you all! For those who may come across this page in the future, here's the code when everything's said and done:

#include <iostream>
#include <string>
#include <vector>
#include <ctype.h>
#include <sstream>

using namespace std;

vector<string> symbolize(string);

int main(int argc, const char * argv[])
{

    string input;
    getline(cin,input);

    vector<string> symbols;

    symbols = symbolize(input);

    for(int i=0;i<symbols.size();i++){
        cout<<symbols.at(i) << endl;
    }

    return 0;
}


vector<string> symbolize(string input){
    int position = 0;
    char c;
    //stringstream s;
    vector<string> symbols;
    enum symbolType {TEXT,OPERATOR}symbolType,charType;

    while(position < input.size()){
        stringstream s;
        c = input.at(position);
        if(isalnum(c))symbolType = TEXT;
        else symbolType = OPERATOR;
        charType = symbolType;

        while(symbolType == charType){
            s << c;
            position++;
            if(position>=input.length())break;
            c = input.at(position);
            if (isspace(c)||c=='\n'){position++; break;}
            if(isalnum(c)) charType = TEXT;
            else charType = OPERATOR;
        }

        symbols.push_back(s.str());
    }

    return symbols;
}
  • What is it _supposed_ to return. From your last comment, "it only returns the first before a space", it sounds like your complaint is that "5+5 6+6" only parses up to the space and then stops. If so, that's because you're only doing "cin >> input" once, and that reads up to whitespace. – abarnert Apr 27 '12 at 22:24
  • Should there be a test for whitespace? The logic assumes anything not alphanumeric is an operator. – wallyk Apr 27 '12 at 22:25
  • @abarnert Yes that was my original problem. What other way can I use to include the spaces? –  Apr 27 '12 at 22:26
  • 1
    Alternatively, if you wanted "5", "+", and "5" instead of "5", "5+", and "5+5", the problem is that stringstream.clear() doesn't do you what you appear to think it does. It clears the stream error flags. If you want to wipe out the whole thing each time through the loop, the simplest way to do it is to move the variable into the outer while loop. – abarnert Apr 27 '12 at 22:27
  • @wallyk I used the name "operator" because I couldn't think of a better term. But, yes it's not just for operators. –  Apr 27 '12 at 22:27
  • @Hassan: Well, if you want to read to the end of the line instead of the first space, use cin.getline. Is that what you want? Or do you want to read to EOF? or…? – abarnert Apr 27 '12 at 22:28
  • @abarnert Yes that is what I want. Also, you were right, I did assume that `s.clear()` would clear the stringstream. –  Apr 27 '12 at 22:29
  • @Hassan: Yes, _which_ of those is what you want? To read until end of line, end of file, or something else? Also, is a space supposed to be a delimiter between tokens, an operator, or text? – abarnert Apr 27 '12 at 22:31
  • @abarnert To read to the end of a line. Why don't you answer, since you were very early to comment. –  Apr 27 '12 at 22:32
  • @abarnert A space is simply a delimiter, I want to ignore them. –  Apr 27 '12 at 22:35

3 Answers3

3

stringstream::clear doesn't clear the string buffer (only the error state).

you can use stringstream::str(x) to set the string buffer, so s.str(string()) or s.str("") instead of s.clear() will clear the string buffer.

Also, the operator<<(istream, ...) only reads until whitespace.

For reading you can try use:

  • istream::get to read one character at a time; or;
  • std::getline(istream,...) to read one line at a time; or;
  • istream::read to read an arbitrary number of characters into a buffer.

http://en.cppreference.com/w/cpp/io/basic_istream

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
  • istream::read is probably not what he wants. It reads up to your buffer size. Presumably he doesn't know the size of the input in advance, which means he's going to have to loop over read, and accumulate a buffer (because a token can cross the boundary between reads), which is much more complicated. – abarnert Apr 27 '12 at 22:30
  • reading into a ringbuffer is the most efficient way and also the hardest to implement. I will move it to the end of list. – Andrew Tomazos Apr 27 '12 at 22:32
  • This is still making things too complicated. istream::getline still requires you to allocate a buffer, and can only read as many characters as you've allocated. Much simpler to just call std::getline(istream&, string&), unless you're worried about pathologically-long strings. – abarnert Apr 27 '12 at 22:38
  • @AndrewTomazos-Fathomling Great answer thanks! abarnet gave his answer earlier in the comments, that's why I accepted it. But thanks for the awesom answer! –  Apr 27 '12 at 22:40
  • Sorry std::getline is what I meant to recommend. Updated. – Andrew Tomazos Apr 27 '12 at 22:40
  • `s.str().clear()` doesn't clear the buffer, it `clear`s the string returned by `s.str()`. – Fraser Apr 27 '12 at 22:42
  • Ahem, fixed. `s.str(string())`. – Andrew Tomazos Apr 27 '12 at 22:45
  • @AndrewTomazos-Fathomling: Congratulations on clearing 3000 reputation with this answer! (Now you will have the power *to close* all our posts!) – thb Apr 28 '12 at 00:06
  • @thb: hehe thanks. "Moderator Tools" soon, whatever they are. :) – Andrew Tomazos Apr 28 '12 at 03:10
3

If you want to read an entire line instead of just one word, use getline instead of operator>>. See http://www.cplusplus.com/reference/string/getline/ for details, or just change line 14 to "getline(cin, input);".

Also, if you want to output "5", "+", "5" instead of "5", "5+", "5+5", you need to reset the stringstream each time through the loop, and clear doesn't do that. The simplest way around this is to just declare the stringstream in the outer loop and get rid of the clear call.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • The code as-is will treat spaces as operators (because they're characters that aren't alnum). If you want to ignore them, you need to add logic for that, like if (isspace(c)) { position++; continue; }. Or you could just go back to reading a word at a time with operator>>, but wrap it in a loop. – abarnert Apr 27 '12 at 22:37
  • @abarnet Thanks. However, I think you meant: `if (isspace(c)){position++; break;}`, since it should be breaking to the outer loop on space. I tried that, it works. –  Apr 27 '12 at 22:47
1

If you move stringstream s; inside the first while loop, you should achieve your aim.

s.clear() only resets the error state flags for the stringstream, it's not like std::string::clear()

Fraser
  • 74,704
  • 20
  • 238
  • 215