24

I would like to load the contents of a text file into a vector<char> (or into any char input iterator, if that is possible). Currently my code looks like this:

std::vector<char> vec;
std::ifstream file("test.txt");
assert(file.is_open());
while (!(file.eof() || file.fail())) {
    char buffer[100];
    file.read(buffer, 100);
    vec.insert(vec.end(), buffer, buffer + file.gcount());
}

I do not like the manual use of a buffer (Why 100 chars? Why not 200, or 25 or whatever?), or the large number of lines that this took. The code just seems very ugly and non-C++. Is there a more direct way of doing this?

Mankarse
  • 39,818
  • 11
  • 97
  • 141
  • 1
    Look at this response: http://stackoverflow.com/questions/132358/how-to-read-file-content-into-istringstream/138645#138645 . It does exactly what you want in an intermediate state, and even vinally it builds an stream from the string (or `vector`). – Diego Sevilla Aug 30 '11 at 10:41
  • @Diego - Write this up as an answer and I will accept it. This is just what I was looking for. – Mankarse Aug 30 '11 at 10:52
  • Done, although then it *may* be considered a duplicate question? – Diego Sevilla Aug 30 '11 at 10:56
  • Re `assert(file.is_open());` : Don't do that! `assert` is a macro that can expand into nothing (e.g., release mode) and even if the `assert` does generate code, it doesn't help. A missing file is a user error, not a programmer error. General guideline: Use `assert` to detect programmer errors. Use something else, *anything else*, for user errors. – David Hammen Aug 30 '11 at 11:28
  • @David Hammen - I know... this is not exactly production code at the moment. I will fix the code in the question though. – Mankarse Aug 30 '11 at 11:30
  • @David Hammen - Or, to put it differently, the assert is to stand in for the programmer error that errors in reading the file are not properly accounted for. – Mankarse Aug 30 '11 at 11:40
  • @Mankarse: maybe `#define TODO_ITEM_SHOULD_NOT_ASSUME assert`, then `TODO_ITEM_SHOULD_NOT_ASSUME(file.is_open())` ;-) – Steve Jessop Aug 30 '11 at 11:46
  • @Steve Jessop - Hehehe, maybe. In the actual code that this was ripped from, it was an `assert(false)` inside an `if` block, with a comment saying `"Not yet implemented"`. I find the Input/Output library to be one of the most confusing part of C++, so I sometimes just give up on trying to write correct I/O code in places where it doesn't matter much. – Mankarse Aug 30 '11 at 11:55
  • possible duplicate of [Efficient way of reading a file into an std::vector?](http://stackoverflow.com/questions/4761529/efficient-way-of-reading-a-file-into-an-stdvectorchar) – Gabriel L. Mar 04 '14 at 22:48
  • Please consider posting your final solution as an _answer_; it should not be in the question. BTW I've just used it for some production code and wish I could upvote it! – Lightness Races in Orbit Jul 24 '16 at 19:00
  • @LightnessRacesinOrbit: Done. Thanks for the prod. :) – Mankarse Jul 25 '16 at 03:18
  • All answers can be found in [this article](http://cpp.indi.frih.net/blog/2014/09/how-to-read-an-entire-file-into-memory-in-cpp/). Unfortunately, they are in section: Bad idea #1 or Bad idea #2. The accepted answer causes UB. – r0n9 Dec 01 '17 at 06:04

5 Answers5

19

If you want to avoid reading char by char:

if (!file.eof() && !file.fail())
{
    file.seekg(0, std::ios_base::end);
    std::streampos fileSize = file.tellg();
    vec.resize(fileSize);

    file.seekg(0, std::ios_base::beg);
    file.read(&vec[0], fileSize);
}
hamstergene
  • 24,039
  • 5
  • 57
  • 72
  • 3
    Neat solution, but is this safe? – doron Aug 30 '11 at 11:32
  • @doron: "Safe" in what sense? – Lightness Races in Orbit Jul 24 '16 at 19:00
  • If the word 'safe' is about not crash the app and copy the value correctly to the vector, I think it is. But after the function `file.read(&vec[0], fileSIZe)` executed. The vector size `vec.size()` will still be zero, vec.empty() is true. Not sure that is safe in your app or not. – r0n9 Dec 01 '17 at 04:31
  • 4
    Actually, after googling, I found [this article](http://cpp.indi.frih.net/blog/2014/09/how-to-read-an-entire-file-into-memory-in-cpp/). The answer is in the section Bad idea #2.Basically, it causes undefined behavior. – r0n9 Dec 01 '17 at 05:59
  • It's not *unsafe* in the sense of memory errors. Here's [the article r0ng linked (now dead)](https://web.archive.org/web/20180314195042/http://cpp.indi.frih.net/blog/2014/09/how-to-read-an-entire-file-into-memory-in-cpp/). TL;DR: 1. Make sure you open the file in binary mode, or `tellg()` might not be correct. 2. *According to the C standard*, `seekg(0, end)` might seek beyond the end of files opened in binary mode. However I would be absolutely amazed if any modern system actually does that because then loads and loads of code like this would break. It's fine as long as you use binary mode. – Timmmm Jul 12 '19 at 11:09
  • 1
    And for the standard sticklers, it's important to remember that nobody actually writes *fully standard compliant code*. Like I doubt much code would work if `char` was 9 bits. So it is ok to rely on de facto standards sometimes. – Timmmm Jul 12 '19 at 11:10
10

I think it's something like this, but have no environment to test it:

std::copy(std::istream_iterator<char>(file), std::istream_iterator<char>(), std::back_inserter(vec));

Could be you have to play with io manipulators for things like linebreaks/whitespace.

Edit: as noted in comments, could be a performance hit.

KillianDS
  • 16,936
  • 4
  • 61
  • 70
  • @Diego: probably, I don't know the implementation details of std and couldn't test it. Also, that's not necessarily an issue, but good note indeed. – KillianDS Aug 30 '11 at 10:45
  • 2
    Note further that if performance requirements aren't too tight for this, and if the questioner really just needs "any char input iterator", then there's no need for a container. `std::istream_iterator(file), std::istream_iterator()` already is the requested InputIterator pair. – Steve Jessop Aug 30 '11 at 11:29
  • 2
    You streams will be buffered so the overhead of kernel calls should be low. The istream iterator could well use a memcpy under the hood as well. It will be interested to see the performance difference between this and Eugene's solution but I don't think the difference will be really big. – doron Aug 30 '11 at 11:35
  • 3
    Hang on, I tell a lie, you're right that an io manipulator is needed to deal with whitespace. The required iterator pair is `std::istream_iterator(file>>std::noskipws), std::istream_iterator()`. – Steve Jessop Aug 30 '11 at 11:43
9

Another approach, using rdbuf() to read the whole file to a std::stringstream first:

#include <fstream>
#include <sstream>
#include <vector>
#include <string>

// for check:
#include <algorithm>
#include <iterator>
#include <iostream>

int main() {
   std::ifstream file("test.cc");
   std::ostringstream ss;
   ss << file.rdbuf();
   const std::string& s = ss.str();
   std::vector<char> vec(s.begin(), s.end());

   // check:
   std::copy(vec.begin(), vec.end(), std::ostream_iterator<char>(std::cout));
}
Flexo
  • 87,323
  • 22
  • 191
  • 272
6

There were lots of good responses. Thanks all! The code that I have decided on using is this:

std::vector<char> vec;
std::ifstream file;
file.exceptions(
    std::ifstream::badbit
  | std::ifstream::failbit
  | std::ifstream::eofbit);
//Need to use binary mode; otherwise CRLF line endings count as 2 for
//`length` calculation but only 1 for `file.read` (on some platforms),
//and we get undefined  behaviour when trying to read `length` characters.
file.open("test.txt", std::ifstream::in | std::ifstream::binary);
file.seekg(0, std::ios::end);
std::streampos length(file.tellg());
if (length) {
    file.seekg(0, std::ios::beg);
    vec.resize(static_cast<std::size_t>(length));
    file.read(&vec.front(), static_cast<std::size_t>(length));
}

Obviously, this is not suitable for extremely large files or performance-critical code, but it is good enough for general purpose use.

Mankarse
  • 39,818
  • 11
  • 97
  • 141
4

use an iterator:

#include <iterator>

istream_iterator<char> data( file );
istream_iterator<char> end;
vec.insert( std::back_inserter(vec), data, end );
spraff
  • 32,570
  • 22
  • 121
  • 229
Adrien Plisson
  • 22,486
  • 6
  • 42
  • 73