3
struct T
{
   void eat(std::string const& segment)
   {
      buffer << segment;

      std::string sentence;
      while (std::getline(buffer, sentence))
         std::cout << "[" << sentence.size() << "]";
   }

   std::stringstream buffer;
};

int main() {
   T t;
   t.eat("A\r\nB\nC\nD");
//        ^^   ^  ^  ^
}

// Actual output:  [2][1][1][1]
// Desired output: [1][1][1][1]

I would like the std::stringstream to strip that carriage return for me (and would prefer not to have to copy and modify segment).

How might I go about this? I would have thought that this would happen anyway, on Linux, for a stream in text mode... but perhaps that mechanism is in the logic of file streams.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • You could use a modified getline function like the ones in [this question](http://stackoverflow.com/questions/6089231/getting-std-ifstream-to-handle-lf-cr-and-crlf). – Firas Assaad Jan 17 '12 at 12:42
  • Is it the case there could be other whitespace present in the strings that are `eat`en ? – hmjd Jan 17 '12 at 13:09
  • @hmjd: Usually no, but I'm not confident enough about that to employ `std::skipws`. – Lightness Races in Orbit Jan 17 '12 at 13:25
  • `std::skipws` isn't a solution, because it only removes leading white space (and the '\015' from Windows will normally appear as trailing) and is ignored by unformatted input functions like `getline`. – James Kanze Jan 17 '12 at 13:57
  • I suppose I could just stream in a character at a time, ignoring `\r`. I cba to benchmark it, but I wonder whether I lose any speed doing that. – Lightness Races in Orbit Jan 17 '12 at 12:42

1 Answers1

2

This is a general problem on Unix machines when reading files created on a Windows machine. I would suggest doing the clean-up at the input level.

One of the best solution I've found when reading line based files is to create a class something like:

class Line
{
    std::string myText;
public:
    friend std::istream& operator>>( std::istream& source, Line& dest )
    {
        std::getline( source, dest.myText );
        if ( source ) {
            dest.myText.erase( 
                std::remove( dest.myText.begin(), dest.myText.end(), '\015' ),
                dest.myText.end() );
        }
        return source;
    }

    operator std::string() const
    {
        return myText;
    }
};

You can add other functions as necessary: the automatic type conversion doesn't play when trying to match templates, for example, and I found it useful to add friends to wrap boost::regex_match.

I use this (without the '\015' removal) even when I don't have to worry about Windows/Linux differences; it supports reading lines using std::istream_iterator<Line>, for example.

Another solution would be to use a filtering streambuf, inserted into the input stream. This is also very simple:

class RemoveCRStreambuf : public std::streambuf
{
    std::streambuf* mySource;
    char myBuffer;  //  One char buffer required for input.
protected:
    int underflow()
    {
        int results = mySource->sbumpc();
        while ( results == '\015' ) {
            results = mySource->sbumpc();
        }
        if ( results != EOF ) {
            myBuffer = results;
            setg( &myBuffer, &myBuffer + 1, &myBuffer + 1 );
        }
        return results;
    }

public:
    RemoveCRStreambuf( std::streambuf* source )
        : mySource( source )
    {
    }
};

To insert it:

std::streambuf* originalSB = source->rdbuf();
RemoveCRStreambuf newSB( originalSB );
source->rdbuf( &newSB );
//  Do input here...
source->rdbuf( originalSB );    //  Restore...

(Obviously, using some sort of RAII for the restoration would be preferable. My own filtering streambuf have a constructor which takes an std::istream; they save a pointer to this as well, and restore the streambuf in their destructor.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329