break long string into multiple c++

Question

I have a string that is received from third party. This string is actually the text from a text file and it may contain UNIX LF or Windows CRLF for line termination. How can I break this into multiple strings ignoring blank lines? I was planning to do the following, but am not sure if there is a better way. All I need to do is read line by line. Vector here is just a convenience and I can avoid it. * Unfortunately I donot have access to the actual file. I only receive the string object *

string textLine;
vector<string> tokens;

size_t pos = 0;
while( true ) {
    size_t nextPos = textLine.find( pos, '\n\r' );
    if( nextPos == textLine.npos )
        break;
    tokens.push_back( string( textLine.substr( pos, nextPos - pos ) ) );
    pos = nextPos + 1;
}

possible duplicate of [How to split a string in C++?](http://stackoverflow.com/questions/236129/how-to-split-a-string-in-c) — Björn Pollex, May 25 '11 at 15:10
Ok, I was over-eager there. This is not a duplicate of that question. My apologies. — Björn Pollex, May 25 '11 at 15:11

score 6 · Answer 1 · answered May 25 '11 at 15:09

6

You could use std::getline as you're reading from the file instead of reading the whole thing into a string. That will break things up line by line by default. You can simply not push_back any string that comes up empty.

string line;
vector<string> tokens;

while (getline(file, line))
{
    if (!line.empty()) tokens.push_back(line);
}

UPDATE:

If you don't have access to the file, you can use the same code by initializing a stringstream with the whole text. std::getline works on all stream types, not just files.

answered May 25 '11 at 15:09

Michael Kristofik

34,290
15
75
125

+1 Totally forgot about getline(). May in fact be the best solution. – helpermethod May 25 '11 at 15:11
Unfortunately I dont have access to the file – Kiran May 25 '11 at 15:11
`getline` accepts a single character as the line delimiter (defaults to `\n`). It won't ignore the carriage returns from Windows CRLF. – zneak May 25 '11 at 15:21
@zneak, that's a good point. [Rob's answer](http://stackoverflow.com/questions/6126635/break-long-string-into-multiple-c/6126825#6126825) gives a way to strip off a `\r` character. – Michael Kristofik May 25 '11 at 15:27
You should use `istringstream`, rather than `stringstream`. – James Kanze May 25 '11 at 15:56

Robᵩ · Accepted Answer · 2011-05-25T15:59:19.070

4

I'd use getline to create new strings based on \n, and then manipulate the line endings.

string textLine;
vector<string> tokens;

istringstream sTextLine;
string line;
while(getline(sTextLine, line)) {
  if(line.empty()) continue;
  if(line[line.size()-1] == '\r') line.resize(line.size()-1);
  if(line.empty()) continue;
  tokens.push_back(line);
}

EDIT: Use istringstream instead of stringstream.

edited May 25 '11 at 15:59

answered May 25 '11 at 15:19

Robᵩ

163,533
20
239
308

Why `stringstream` instead of `istringstream`? You're not outputting anything. – James Kanze May 25 '11 at 15:55

score 2 · Answer 3 · edited May 23 '17 at 12:04

2

I would use the approach given here (std::getline on a std::istringstream)...

Splitting a C++ std::string using tokens, e.g. ";"

... except omit the ';' parameter to std::getline.

edited May 23 '17 at 12:04

Community

1
1

answered May 25 '11 at 15:12

Martin Stone

12,682
2
39
53

score 1 · Answer 4 · answered May 25 '11 at 16:36

A lot depends on what is already present in your toolkit. I work a lot with files which come from Windows and are read under Unix, and vice versa, so I have most of the tools for converting CRLF into LF at hand. If you don't have any, you might want a function along the lines of:

void addLine( std::vector<std::string>& dest, std::string line )
{
    if ( !line.empty() && *(line.end() - 1) == '\r' ) {
        line.erase( line.end() - 1 );
    }
    if ( !line.empty() ) {
        dest.push_back( line );
    }
}

to do your insertions. As for breaking the original text into lines, you can use std::istringstream and std::getline, as others have suggested; it's simple and straightforward, even if it is overkill. (The std::istringstream is a fairly heavy mechanism, since it supports all sorts of input conversions you don't need.) Alternatively, you might consider a loop along the lines of:

std::string::const_iterator start = textLine.begin();
std::string::const_iterator end   = textLine.end();
std::string::const_iterator next  = std::find( start, end, '\n' );
while ( next != end ) {
    addLine( tokens, std::string( start, next ) );
    start = next + 1;
    next = std::find( start, end, '\n' );
}
addLine( tokens, std::string( start, end ) );

Or you could break things down into separate operations:

textLine.erase(
    std::remove( textLine.begin(), textLine.end(), '\r'),
    textLine.end() );

to get rid of all of the CR's,

std::vector<std:;string> tokens( split( textLine, '\n' ) );

, to break it up into lines, where split is a generalized function along the lines of the above loop (a useful tool to add to your toolkit), and finally:

tokens.erase(
    std::remove_if( tokens.begin(), tokens.end(), 
                    boost::bind( &std::string::empty, _1 ) ),
    tokens.end() );

. (Generally speaking: if this is a one-of situation, use the std::istringstream based solution. If you think you may have to do something like this from time to time in the future, add the split function to your took kit, and use it.)

score 0 · Answer 5 · answered May 25 '11 at 15:10

0

You could use strtok.

Split string into tokens

A sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters.

answered May 25 '11 at 15:10

helpermethod

59,493
71
188
276

1

You could, but I wouldn't. `strtok` is very easy to misuse. – Michael Kristofik May 25 '11 at 15:15
Correct. Come to think of it, suggesting `strtok` was a bad advice :-(. – helpermethod May 25 '11 at 20:07

score 0 · Answer 6 · answered May 25 '11 at 15:13

0

I would put the string in a stringstream and then use the getline method like the previous answer mentioned. Then, you could just act like you were reading the text in from a file when it really comes from another string.

answered May 25 '11 at 15:13

Jonathan Geisler

430
3
14

break long string into multiple c++

6 Answers6