How do you read n bytes from a file and put them into a vector using iterators?

Question

Based on this this question:

How to read a binary file into a vector of unsigned chars

In the answer they have:

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::basic_ifstream<BYTE> file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)),
                              std::istreambuf_iterator<BYTE>());
}

Which reads the entire file into the vector.

What I want to do is read (for example) 100 bytes at a time in the vector, then do stuff, and then read the next 100 bytes into the vector (clear the vector between). I don't see how to specify how much of the file to read (i.e. how to setup the iterators). Is that even possible?

I am trying to avoid having to write my own code loop to copy each byte at a time.

even if you want to process chunks of 100bytes you should read the whole file at once (unless it is some Gbs in size) — 463035818_is_not_an_ai, May 23 '18 at 15:10
@user463035818 hmm... the files length could be anything so to save on ram I wanted to break it into 10k chunks. I doubt it will be Gb's but it could be large number of Mbs.... also this needs to scale down onto platforms with more limited ram, so I am just trying to be cautious : ) — code_fodder, May 23 '18 at 15:13
hm. ok, low ram is something to consider, but usually I'd try to limit the number of reads/writes to files rather than the amount of each individual read/write — 463035818_is_not_an_ai, May 23 '18 at 15:14
@user463035818 I do get your point, very valid : ) ... for another part of my system I am just read small text files where I will do as you suggest (makes life easier!) — code_fodder, May 23 '18 at 15:20
@user463035818: I don't agree. C++ standard library is smart enough to internally use buffers to limit the numbers of physical io operations. So you should never do explicit buffering, except if you are writing a library module requiring a particular buffering for optimization reasons. For any other use case, just rely on the standard library. — Serge Ballesta, May 23 '18 at 15:34
@SergeBallesta i dont really understand. if the size of the file is bigger than the available memory you'll get a out-of-memory and I dont know how the standard library would help on that. Or are you arguing against reading the whole file at once (that was my actual point) — 463035818_is_not_an_ai, May 23 '18 at 15:41
@user463035818: I'm sorry that it was not clear, but it was a response to your first comment advising to read the whole file in memory to later process chunks. So yes I am arguing against reading the whole file at once. — Serge Ballesta, May 23 '18 at 15:47
@SergeBallesta OPs code is creating a new stream on each call of the method, so buffering of the `std::stream` alone wont help. Repeatedly reading from the same stream is a different situation and then maybe I could agree with you — 463035818_is_not_an_ai, May 23 '18 at 15:49
@user463035818: IMHO the way to go here is the accepted answer that repeatedly reads 100 bytes from the stream... — Serge Ballesta, May 23 '18 at 16:10

R Sahu · Accepted Answer · 2022-05-16T02:56:56.147

3

You can use ifstream::read for that.

std::vector<BYTE> v(100);
while ( file.read(reinterpret_cast<char*>(v.data()), 100) )
{
   // Find out how many characters were actually read.
   auto count = file.gcount();

   // Use v up to count BTYEs.
}

edited May 16 '22 at 02:56

answered May 23 '18 at 15:14

R Sahu

204,454
14
159
270

This does not read last remaining bytes if those are less than 100 (or less than given n bytes) – Swapnil May 14 '22 at 14:37
@Swapnil, it sure does. From https://en.cppreference.com/w/cpp/io/basic_istream/read: end of file condition occurs on the input sequence (in which case, `setstate(failbit|eofbit)` is called). The number of successfully extracted characters can be queried using `gcount()`. – R Sahu May 16 '22 at 02:58

score 1 · Answer 2 · answered May 23 '18 at 15:24

You could write a function to:

void readFile( const std::string &fileName, size_t chunk, std::function<void(const std::vector<BYTE>&)> proc )
{
    std::ifstream f( fileName );
    std::vector<BYTE> v(chunk);
    while( f.read( v.data(), v.size() ) ) {
        v.resize( f.gcount() );
        proc( v );
        v.resize( chunk );
    }
}

then usage is simple:

void process( const std::vector<BYTE> &v ) { ... }

readFile( "foobar", 100, process ); // call process for every 100 bytes of data

or you can use lambda etc for callback.

Thanks - this is a good example of the minimal code I need to write for my loop : ) — code_fodder, May 23 '18 at 15:39

Mateusz Wojtczak · Answer 3 · 2018-05-23T16:05:02.160

Or you can write your own function for that:

template<typename Data>
std::istreambuf_iterator<Data> readChunk(std::istreambuf_iterator<Data>& curr, std::vector<Data>& vec, size_t chunk = 100) {
    for (int i = 0; curr != std::istreambuf_iterator<Data>() && i < chunk; ++i, ++curr) {
        vec.emplace_back(*curr);
    }
    return curr;
}

and use it as:

std::ifstream file("test.cpp");
std::vector<BYTE> v;
std::istreambuf_iterator<BYTE> curr(file);
readChunk<BYTE>(curr, v);

And you can call this function again.

How do you read n bytes from a file and put them into a vector using iterators?

3 Answers3

Linked