56

Possible Duplicate:
How to split a string in C++?

I need to split a string by single spaces and store it into an array of strings. I can achieve this using a istringstream, but what I am not being able to achieve is this:

I want every space to terminate the current word. So, if there are two spaces consecutively, one element of my array should be blank.

For example:

(underscore denotes space)

This_is_a_string.
gets split into:
A[0] = This
A[1] = is
A[2] = a
A[3] = string.

This__is_a_string.
gets split into:
A[0] = This
A[1] = ""
A[2] = is
A[3] = a
A[4] = string.

How can I implement this?

hackjutsu
  • 8,336
  • 13
  • 47
  • 87
Ayush
  • 41,754
  • 51
  • 164
  • 239

7 Answers7

40

If strictly one space character is the delimiter, probably std::getline will be valid.
For example:

int main() {
  using namespace std;
  istringstream iss("This  is a string");
  string s;
  while ( getline( iss, s, ' ' ) ) {
    printf( "`%s'\n", s.c_str() );
  }
}
Ise Wisteria
  • 11,259
  • 2
  • 43
  • 26
  • 3
    Can anyone explain the performance overhead of ` string line, word; while (getline(cin, line)) { istringstream ss(line); while (ss >> word) // parse word }` To be specific, how istringstream constructor from string is implemented, does it copy the string? Will compiler smart enough to move ss declaration out of the while loop? Thanks – csyangchen Jun 09 '12 at 12:41
  • pretty simple implementation. Thanks! – Madhurya Gandi Nov 17 '17 at 04:59
40

You can even develop your own split function (I know, little old-fashioned):

size_t split(const std::string &txt, std::vector<std::string> &strs, char ch)
{
    size_t pos = txt.find( ch );
    size_t initialPos = 0;
    strs.clear();

    // Decompose statement
    while( pos != std::string::npos ) {
        strs.push_back( txt.substr( initialPos, pos - initialPos ) );
        initialPos = pos + 1;

        pos = txt.find( ch, initialPos );
    }

    // Add the last one
    strs.push_back( txt.substr( initialPos, std::min( pos, txt.size() ) - initialPos + 1 ) );

    return strs.size();
}

Then you just need to invoke it with a vector<string> as argument:

int main()
{
    std::vector<std::string> v;

    split( "This  is a  test", v, ' ' );
    dump( cout, v );

    return 0;
}

Find the code for splitting a string in IDEone.

Hope this helps.

Baltasarq
  • 12,014
  • 3
  • 38
  • 57
  • Warning! :) If you want to have elements without spaces replace both occurences of "- initialPos + 1" for just "- initialPos" – teejay Oct 17 '13 at 15:06
  • I would recommend using `size_t` instead of `unsinged int` for `pos` and `initialPos`. Otherwise you might get into an infinite loop, like I just experienced. – CodeMonkey Mar 12 '18 at 01:04
  • To also get the last string, I had to change the `while` loop to be `initialPos` instead of `pos` and ternary `initialPos = pos + 1 != 0 ? pos + 1 : pos;` – CodeMonkey Mar 12 '18 at 01:15
8

Can you use boost?

samm$ cat split.cc
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>

#include <boost/foreach.hpp>

#include <iostream>
#include <string>
#include <vector>

int
main()
{
    std::string split_me( "hello world  how are   you" );

    typedef std::vector<std::string> Tokens;
    Tokens tokens;
    boost::split( tokens, split_me, boost::is_any_of(" ") );

    std::cout << tokens.size() << " tokens" << std::endl;
    BOOST_FOREACH( const std::string& i, tokens ) {
        std::cout << "'" << i << "'" << std::endl;
    }
}

sample execution:

samm$ ./a.out
8 tokens
'hello'
'world'
''
'how'
'are'
''
''
'you'
samm$ 
Sam Miller
  • 23,808
  • 4
  • 67
  • 87
3

If you are not averse to boost, boost.tokenizer is flexible enough to solve this

#include <string>
#include <iostream>
#include <boost/tokenizer.hpp>

void split_and_show(const std::string s)
{
    boost::char_separator<char> sep(" ", "", boost::keep_empty_tokens);
    boost::tokenizer<boost::char_separator<char> > tok(s, sep);
    for(auto i = tok.begin(); i!=tok.end(); ++i)
            std::cout << '"' << *i << "\"\n";
}
int main()
{
    split_and_show("This is a string");
    split_and_show("This  is a string");

}

test: https://ideone.com/mN2sR

Cubbi
  • 46,567
  • 13
  • 103
  • 169
3

If you are averse to boost, you can use regular old operator>>, along with std::noskipws:

EDIT: updates after testing.

#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
#include <sstream>

void split(const std::string& str, std::vector<std::string>& v) {
  std::stringstream ss(str);
  ss >> std::noskipws;
  std::string field;
  char ws_delim;
  while(1) {
    if( ss >> field )
      v.push_back(field);
    else if (ss.eof())
      break;
    else
      v.push_back(std::string());
    ss.clear();
    ss >> ws_delim;
  }
}

int main() {
  std::vector<std::string> v;
  split("hello world  how are   you", v);
  std::copy(v.begin(), v.end(), std::ostream_iterator<std::string>(std::cout, "-"));
  std::cout << "\n";
}

http://ideone.com/62McC

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
2

You could also just use the old fashion 'strtok'

http://www.cplusplus.com/reference/clibrary/cstring/strtok/

Its a bit wonky but doesn't involve using boost (not that boost is a bad thing).

You basically call strtok with the string you want to split and the delimiter (in this case a space) and it will return you a char*.

From the link:

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}
TurqMage
  • 3,321
  • 2
  • 31
  • 52
1

You could used simple strtok() function (*)From here. Note that tokens are created on delimiters

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This is a string";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}