Split a c++ string without boost and not on whitespace

Question

Possible Duplicate:
Splitting a string in C++

I have a string:
14332x+32x=10
I'd like to split it so that it looks like:
[14332][+32][10]
So far, I've tried doing

char c;
std::stringstream ss(equation1);
while (ss >> c) {
    std::cout << c << std::endl;
}

but after testing what that prints, I don't think it's possible to do from that info. I know that I need to split the string on x and =, but I'm not sure if that's possible and if it is how. I've googled it and didn't find anything that looked helpful, but i'm new too c++ and the answer might be right in front of me.
I'd like to not use boost. Any advice would be helpful!

The link I put in your last question includes answers for this. — chris, Jan 31 '13 at 01:10
see: http://stackoverflow.com/questions/236129/splitting-a-string-in-c — Csq, Jan 31 '13 at 01:10
I think you should use the functionality of the `std::string` class. See the [reference](http://en.cppreference.com/w/cpp/string/basic_string) and skim over the members. You should be able to come up with a fast way to pull sub-strings out based on characters you `.find()`. — ChiefTwoPencils, Jan 31 '13 at 01:19
... if you really like those `stringstream`s, simply consume the numeric types and then `istream::peek` and `istream::ignore` until you hit the next digit. The other approaches (boost/`string::find`) are likely to give you more robust solutions, though. — us2012, Jan 31 '13 at 01:27
I looked at it, and tried using this code: std::istringstream iss(equation1); copy(std::istream_iterator(iss), std::istream_iterator(), std::ostream_iterator(std::cout, "x")); It just printed out the original string (12344x+3x=10), but with a x at the end — Tips48, Jan 31 '13 at 01:33

score 4 · Answer 1 · answered Jan 31 '13 at 02:21

Consider using using a facet that specifies x and = as whitespace characters:

#include <locale>
#include <iostream>
#include <sstream>

struct punct_ctype : std::ctype<char> {
  punct_ctype() : std::ctype<char>(get_table()) {}
  static mask const* get_table()
  {
    static mask rc[table_size];
    rc[' '] = std::ctype_base::space;
    rc['\n'] = std::ctype_base::space;
    rc['x'] = std::ctype_base::space;
    rc['='] = std::ctype_base::space;
    return &rc[0];
  }
};

int main() {
  std::string equation;
  while(std::getline(std::cin, equation)) {
    std::istringstream ss(equation);
    ss.imbue(std::locale(ss.getloc(), new punct_ctype));
    std::string term;
    while(ss >> term) {
      std::cout << "[" << term << "]";
    }
    std::cout << "\n";
  }
}

An interesting approach - facets and locales seem to be little used aspects of C++. — Michael Burr, Jan 31 '13 at 07:38

score 1 · Answer 2 · answered Jan 31 '13 at 01:15

1

The manual way would be to to do a for loop on each character in the string and if the character is == the character your splitting by copy it to a new string (use list/array of strings if >1 split is expected).

Also I think std has split by character functionality. If not, then stringstream::GetLine() has an overload that takes in a character to split by and it will ignore spaces.

GetLine() is very good :)

answered Jan 31 '13 at 01:15

Sellorio

1,806
1
16
32

I think I'm going to try to use this, thanks – Tips48 Jan 31 '13 at 02:19

score 1 · Answer 3 · answered Jan 31 '13 at 01:41

1

You can use sscanf like this:

sscanf(s.c_str(), "%[^x]x%[^x]x=%s", a, b, c);

Where %[^x] represents "any character except x". If you don't care for the symbols (i.e. + etc) but just for the numbers, you could do something like:

sscanf(s.c_str(), "%dx%dx=%d", &x, &y, &z);

answered Jan 31 '13 at 01:41

pldoverflow

869
6
10

This breaks as soon as the input equation has more or less terms than given in the example. – us2012 Jan 31 '13 at 01:46
+1 - `"%[^x]x%[^x]x=%s"` is missing a plus just after the second *x* though. – Tony Delroy Jan 31 '13 at 01:46
@us2012: yes, this is specific to his format. However, if you know the format follows some pre-defined standard you can always change your regex in `sscanf` to handle it. – pldoverflow Jan 31 '13 at 01:48
@TonyD: His expected output included the plus sign - `[+32]`, that's why I didn't add it. – pldoverflow Jan 31 '13 at 01:48
1

@us2012: IMHO the question didn't specify any general requirements, so doesn't really deserve a general answer. There are too many possibilities... could the numbers be floats? negative? is the `=number` optional, or perhaps even `=numberx+numberx` is allowed? Can't meaningfully answer without huge speculation, so it's reasonable to answer exactly and only what was asked. – Tony Delroy Jan 31 '13 at 01:49
@pldoverflow: so it does - sorry for my oversight. – Tony Delroy Jan 31 '13 at 01:49
I'll give a little more information. I left the + sign in because I figured it would be much harder to remove it, but in the perfect scenario it would be gone. There can be negative numbers, and decimals. The equals number is not optional, and it can't be number+number. Hopefully that clears it up so there can be a more specific answer – Tips48 Jan 31 '13 at 02:21
Looked into using sscanf and I got errors trying to compile. Are you sure it's not deprecated/unsafe? – Tips48 Jan 31 '13 at 02:32
@Tips48: `sscanf` is definitely unsafe, but so are a lot of useful things. The dangers include using `%`- formatting statements that don't match the type of the parameters they copy values into, and for string conversions like %s and the %x used, overrunning fixed-sized buffers. That doesn't mean it can't be 100% reliably used when the arguments are correct and either string conversion width limits used or the inputs can be trusted to be smaller. Pointers, `new`/`delete`, varargs - all sorts of things are unsafe when used carelessly. – Tony Delroy Jan 31 '13 at 02:46
@Tips48: It should compile without problems. What were the errors? – pldoverflow Jan 31 '13 at 02:56

Carl · Answer 4 · 2013-01-31T02:36:01.370

If you don't mind using c++11, you could use something similar to this:

#include <string>
#include <vector>
#include <iostream>
#include <algorithm>
#include <functional>
#include <unordered_set>

typedef std::vector<std::string> strings;
typedef std::unordered_set<char> tokens;

struct tokenize
{
    tokenize(strings& output,const tokens& t) : 
    v_(output),
    t_(t)
    {}        
    ~tokenize()
    {
        if(!s.empty())
            v_.push_back(s);
    }
    void operator()(const char &c)
    {
        if(t_.find(c)!=t_.end())
        {
            if(!s.empty())
                v_.push_back(s);
            s="";
        }
        else
        {
            s = s + c;
        }
    }
    private:
    std::string s;
    strings& v_;
    const tokens& t_;
};

void split(const std::string& input, strings& output, const tokens& t )
{
    tokenize tokenizer(output,t);
    for( auto i : input )
    {
        tokenizer(i);
    }
}

int main()
{
    strings tokenized;
    tokens t;
    t.insert('x');
    t.insert('=');
    std::string input = "14332x+32x=10";
    split(input,tokenized,t);
    for( auto i : tokenized )
    {
        std::cout<<"["<<i<<"]";
    }
    return 0;
}

Ideone link to the above code: http://ideone.com/17g75F

score 0 · Answer 5 · edited May 23 '17 at 10:24

See this SO answer for a getline_until() function that provides a simple, basic tokenization capability that should let you do something like the following:

#include <string>
#include <stringstream>

#include "getline_until.h"

int main()
{
    std::string equation1("14332x+32x=10");
    std::stringstream ss(equation1);

    std::string token;
    while (getline_until(ss, token, "x=")) {
        if (!token.empty()) std::cout << "[" << token << "]";
    } 

    std::cout << std::endl;
}

The getline_until() function lets you specify a list of delimiters similar to strtok() (though getline_until() will return empty tokens instead of skipping a run of delimiters like strtok()). Or you can provide a predicate that lets you use a function to decide when to delimit tokens.

One thing it won't let you do (again - similar to strtok() or the standard getline()) is split tokens on merely context - there has to be a delimiter character that gets discarded. For example, with the following input:

42+24

getline_until() (like strtok() or getline()) cannot split the above into the tokens 42, +, and 24.

Split a c++ string without boost and not on whitespace

5 Answers5

Linked