59

I would like to get an istream_iterator-style iterator that returns each line of the file as a string rather than each word. Is this possible?

thehouse
  • 7,957
  • 7
  • 33
  • 32
  • 1
    I guess you could always write your own using the getline() function as Matteo Italia said. – Jaime Garcia Feb 18 '10 at 20:10
  • 1
    Duplicate: http://stackoverflow.com/questions/1567082/how-do-i-iterate-over-cin-line-by-line-in-c/1567703 – Jerry Coffin Feb 18 '10 at 20:15
  • 3
    @Jerry: That thread contains the answer. But the question is completely different. – UncleBens Feb 18 '10 at 20:36
  • @UnbleBens:the question is *phrased* differently, but isn't really noticeably different. – Jerry Coffin Feb 18 '10 at 20:51
  • @Jerry: Thanks! I'm going to go with the solution you posted to the other question. But I agree with UncleBens that that wasn't the question I asked at all. I specifically want 'an iterator' as the function I'm passing it to takes a begin and an end. – thehouse Feb 18 '10 at 21:19
  • As an aside, I noticed some other answers to this question earlier and when I checked back they had disappeared. Why might this happen (this is my first SO question)? – thehouse Feb 18 '10 at 21:20
  • @thehouse - I deleted my answer when I realized that the exact trick had already been posted by Jerry in that other thread (in fact, it is quite likely that I actually learned it there in the first place). I'll undelete the answer and replace it with a link to Jerry's answer – Manuel Feb 18 '10 at 21:41
  • @thehouse:answers can disappear when/if somebody deletes them. Pretty much anybody can delete their own answer, and moderators can delete other people's posts as well. For what it's worth, there's an entire web site (meta.stackoverflow.com) devoted to questions like this about stackoverflow. – Jerry Coffin Feb 18 '10 at 21:42
  • Thinking a bit more, I'd agree that there is *some* difference between this question and the previous one, but not much, but UncleBens's answer to that question also answers this one quite nicely, so I still think the difference is mostly one of wording, but what the heck... – Jerry Coffin Feb 18 '10 at 23:05

8 Answers8

36

EDIT: This same trick was already posted by someone else in a previous thread.

It is easy to have std::istream_iterator do what you want:

namespace detail 
{
    class Line : std::string 
    { 
        friend std::istream & operator>>(std::istream & is, Line & line)
        {   
            return std::getline(is, line);
        }
    };
}

template<class OutIt>
void read_lines(std::istream& is, OutIt dest)
{
    typedef std::istream_iterator<detail::Line> InIt;
    std::copy(InIt(is), InIt(), dest);
}

int main()
{
    std::vector<std::string> v;
    read_lines(std::cin, std::back_inserter(v));

    return 0;
}
Community
  • 1
  • 1
Manuel
  • 12,749
  • 1
  • 27
  • 35
  • 1
    Does inheriting from std::string here not violate Rule 35 of the C++ Coding Standard: "Avoid inheriting from classes that were not designed to be base classes"? – thehouse Feb 18 '10 at 22:36
  • 4
    @thehouse - What coding standard do you mean? I don't think there's anything wrong with using an arbitrary class as base provided that it's not used in a polymorphic context. For instance, the inheritance scheme in my answer would be dangerous if I made things like `string * ptr = new Line; delete ptr;` but that's not the case here – Manuel Feb 18 '10 at 22:56
  • 1
    @Manuel: Herb Sutter and Andrei Alexandrescu in their book C++ Coding Standards state that "using a standalone class as a base is a serious design error and should be avoided". They go on to mention string explicitly as a bad class to inherit from. Their argument revolves around the fact that you have to do one set of things to get a base class to work safely and a contradictory set of things to make concrete classes safe. – thehouse Feb 18 '10 at 23:15
  • 3
    It is wrong, completely wrong, and was not so in the original example (the author wisely chose `Composition` instead). `@Manuel` prove me no-one will use them in a polymorphic context... I am waiting. – Matthieu M. Feb 19 '10 at 08:05
  • Not really, but it's better I will admit it. I am sorry but I don't like inheritance in general... and certainly private inheritance when composition would do. Guess we all have our habits... – Matthieu M. Feb 20 '10 at 12:57
  • 2
    Can you explain why we needed to inherit from string class? – Mr.Anubis Sep 06 '11 at 10:47
  • 2
    In the end I used this method but storing the `std::string` as a member rather than inheriting - a matter of taste. – thehouse May 29 '12 at 00:54
  • 2
    It's worth pointing out that inheritance is a nice way to adjust interfaces. It's easy to read and understand. If no new members are introduced then heap based logic won't ruin you. Anything more complicated is asking for trouble – Polymer Nov 11 '13 at 22:16
  • `std::string` does NOT specify a virtual destructor, therefore it is NOT safe to inherit from it. The Standard Library is written in a way that separates datatypes out from algorithms; you should be able to do this in another, more proper way. – Qix - MONICA WAS MISTREATED Aug 23 '21 at 19:46
25

The standard library does not provide iterators to do this (although you can implement something like that on your own), but you can simply use the getline function (not the istream method) to read a whole line from an input stream to a C++ string.

Example:

#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>

using namespace std;

int main()
{
    ifstream is("test.txt");
    string str;
    while(getline(is, str))
    {
        cout<<str<<endl;
    }
    return 0;
}
Community
  • 1
  • 1
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Does it handle the difference in eol characters for the different platforms (windows/unix/mac)? – Kelly S. French Jul 13 '11 at 14:18
  • 2
    That difference is already handled in the stream object: when you open a file in text mode (the default if you don't specify the `ios::binary` flag) the stream automatically converts the platform-specific eol to plain `\n`. – Matteo Italia Jul 13 '11 at 17:47
  • we are using a COM istream that didn't treat the EOLs the same. Parsing a dos file worked but parsing a UNIX (no LF) file caused it to be handled as if it were one big line. – Kelly S. French Jul 14 '11 at 17:20
  • 1
    @Kelly: uh, wait; `std::istream` converts correctly only the EOL native of the current platform, for others it will probably do nothing. Also, now you're talking about a COM istream, so you should refer to its documentation. – Matteo Italia Jul 14 '11 at 21:26
7

Here is a solution. The exemple print the input file with @@ at the end of each line.

#include <iostream>
#include <iterator>
#include <fstream>
#include <string>

using namespace std;

class line : public string {};

std::istream &operator>>(std::istream &is, line &l)
{
    std::getline(is, l);
    return is;
}

int main()
{
    std::ifstream inputFile("input.txt");

    istream_iterator<line> begin(inputFile);
    istream_iterator<line> end;

    for(istream_iterator<line> it = begin; it != end; ++it)
    {
        cout << *it << "@@\n";
    }

    getchar();
}

Edit : Manuel has been faster.

Rexxar
  • 1,886
  • 1
  • 14
  • 19
3

You could write your own iterator. It's not that hard. An iterator is just a class on which (simply speaking) the increment and * operators are defined.

Look at http://www.drdobbs.com/cpp/184401417 to get started writing your own iterators.

Patrick
  • 23,217
  • 12
  • 67
  • 130
  • 3
    @thehouse: you might also want to check out `boost::iterator_facade`, which implements the full STL iterator concept in terms of a few core functions. – Emile Cormier Feb 18 '10 at 21:34
2

It is also possible to use range-based for loop:

// Read from file.
std::ifstream f("test.txt");
for (auto& line : lines(f))
  std::cout << "=> " << line << std::endl;

// Read from string.
std::stringstream s("line1\nline2\nline3\n\n\nline4\n\n\n");
for (auto& line : lines(s))
  std::cout << "=> " << line << std::endl;

where lines is defined in the following way:

#include <string>
#include <iterator>
#include <istream>

struct line_iterator {
  using iterator_category = std::input_iterator_tag;
  using value_type = std::string;
  using difference_type = std::ptrdiff_t;
  using reference = const value_type&;
  using pointer = const value_type*;

  line_iterator(): input_(nullptr) {}
  line_iterator(std::istream& input): input_(&input) { ++*this; }

  reference operator*() const { return s_; }
  pointer operator->() const { return &**this; }

  line_iterator& operator++() {
    if (!std::getline(*input_, s_)) input_ = nullptr;
    return *this;
  }

  line_iterator operator++(int) {
    auto copy(*this);
    ++*this;
    return copy;
  }

  friend bool operator==(const line_iterator& x, const line_iterator& y) {
    return x.input_ == y.input_;
  }

  friend bool operator!=(const line_iterator& x, const line_iterator& y) {
    return !(x == y);
  }

 private:
  std::istream* input_;
  std::string s_;
};

struct lines {
  lines(std::istream& input): input_(input) {}

  line_iterator begin() const { return line_iterator(input_); }
  line_iterator end() const { return line_iterator(); }

 private:
  std::istream& input_;
};
dmitriykovalev
  • 534
  • 3
  • 6
1

You can use istreambuf_iterator instead of istream_iterator. It doesn't ignore control characters like istream_iterator.

code.cpp:

#include <iterator>
#include <iostream>
#include <fstream>

using namespace std;

int main()
{
    ifstream file("input.txt");

    istreambuf_iterator<char> i_file(file);

    istreambuf_iterator<char> eof;

    std::string buffer;
    while(i_file != eof)
    {
        buffer += *i_file;
        if(*i_file == '\n')
        {
            std::cout << buffer;
            buffer.clear();
        }
        ++i_file;
    }

    return 0;
}

input.txt:

ahhhh test *<-- There is a line feed here*
bhhhh second test *<-- There is a line feed here*

output:

ahhhh test
bhhhh second test
coelhudo
  • 4,710
  • 7
  • 38
  • 57
1

In a related thread iterate-over-cin-line-by-line quoted above, Jerry Coffin described "another possibility (which) uses a part of the standard library most people barely even know exists." The following applies that method (which was what I was looking for) to solve the iterate-over-file-line-by-line problem as requested in the current thread.

First a snippet copied directly from Jerry's answer in the related thread:

struct line_reader: std::ctype<char> {
line_reader(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
    static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
    rc['\n'] = std::ctype_base::space;
    return &rc[0];
}}; 

And now, imbue the ifstream with the custom locale as described by Jerry, and copy from infstream to ofstream.

ifstream is {"fox.txt"};
is.imbue(locale(locale(), new line_reader()));
istream_iterator<string> ii {is};
istream_iterator<string> eos {};

ofstream os {"out.txt"};
ostream_iterator<string> oi {os,"\n"};

vector<string> lines {ii,eos};
copy(lines.begin(), lines.end(), oi);

The output file ("out.txt") will be exactly the same as the input file ("fox.txt").

winvicta
  • 21
  • 1
  • 1
  • This is excellent. It's worth stating that this is the only answer that returns an iterator (so it can be used in STL algorithms) and does not involve any copies of the data to achieve that. – SamBob Dec 02 '22 at 14:20
1

Here is a pretty clean approach that uses boost::tokenizer. This returns an object providing begin() and end() member functions; for a complete interface, see the documentation of the tokenizer class.

#include <boost/tokenizer.hpp>
#include <iostream>
#include <iterator> 


using istream_tokenizer = boost::tokenizer<boost::char_separator<char>,
                                           std::istreambuf_iterator<char>>;

istream_tokenizer line_range(std::istream& is);
{
    using separator = boost::char_separator<char>;

    return istream_tokenizer{std::istreambuf_iterator<char>{is},
                             std::istreambuf_iterator<char>{},
                             separator{"\n", "", boost::keep_empty_tokens}};
}

This hardcodes char as the stream's character type, but this could be templatized.

The function can be used as follows:

#include <sstream>

std::istringstream is{"A\nBB\n\nCCC"};

auto lines = line_range(is);
std::vector<std::string> line_vec{lines.begin(), lines.end()};
assert(line_vec == (std::vector<std::string>{{"A", "BB", "", "CCC"}}));

Naturally, it can also be used with an std::ifstream created by opening a file:

std::ifstream ifs{"filename.txt"};
auto lines = line_range(ifs);
NicholasM
  • 4,557
  • 1
  • 20
  • 47