1

Based on this this question:

How to read a binary file into a vector of unsigned chars

In the answer they have:

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::basic_ifstream<BYTE> file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)),
                              std::istreambuf_iterator<BYTE>());
}

Which reads the entire file into the vector.

What I want to do is read (for example) 100 bytes at a time in the vector, then do stuff, and then read the next 100 bytes into the vector (clear the vector between). I don't see how to specify how much of the file to read (i.e. how to setup the iterators). Is that even possible?

I am trying to avoid having to write my own code loop to copy each byte at a time.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
code_fodder
  • 15,263
  • 17
  • 90
  • 167
  • 1
    even if you want to process chunks of 100bytes you should read the whole file at once (unless it is some Gbs in size) – 463035818_is_not_an_ai May 23 '18 at 15:10
  • 1
    @user463035818 hmm... the files length could be anything so to save on ram I wanted to break it into 10k chunks. I doubt it will be Gb's but it could be large number of Mbs.... also this needs to scale down onto platforms with more limited ram, so I am just trying to be cautious : ) – code_fodder May 23 '18 at 15:13
  • hm. ok, low ram is something to consider, but usually I'd try to limit the number of reads/writes to files rather than the amount of each individual read/write – 463035818_is_not_an_ai May 23 '18 at 15:14
  • @user463035818 I do get your point, very valid : ) ... for another part of my system I am just read small text files where I will do as you suggest (makes life easier!) – code_fodder May 23 '18 at 15:20
  • 1
    @user463035818: I don't agree. C++ standard library is smart enough to internally use buffers to limit the numbers of physical io operations. So you should never do explicit buffering, except if you are writing a library module requiring a particular buffering for optimization reasons. For any other use case, just rely on the standard library. – Serge Ballesta May 23 '18 at 15:34
  • @SergeBallesta i dont really understand. if the size of the file is bigger than the available memory you'll get a out-of-memory and I dont know how the standard library would help on that. Or are you arguing against reading the whole file at once (that was my actual point) – 463035818_is_not_an_ai May 23 '18 at 15:41
  • @user463035818: I'm sorry that it was not clear, but it was a response to your first comment advising to read the whole file in memory to later process chunks. So yes I am arguing against reading the whole file at once. – Serge Ballesta May 23 '18 at 15:47
  • @SergeBallesta OPs code is creating a new stream on each call of the method, so buffering of the `std::stream` alone wont help. Repeatedly reading from the same stream is a different situation and then maybe I could agree with you – 463035818_is_not_an_ai May 23 '18 at 15:49
  • @user463035818: IMHO the way to go here is the accepted answer that repeatedly reads 100 bytes from the stream... – Serge Ballesta May 23 '18 at 16:10

3 Answers3

3

You can use ifstream::read for that.

std::vector<BYTE> v(100);
while ( file.read(reinterpret_cast<char*>(v.data()), 100) )
{
   // Find out how many characters were actually read.
   auto count = file.gcount();

   // Use v up to count BTYEs.
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • This does not read last remaining bytes if those are less than 100 (or less than given n bytes) – Swapnil May 14 '22 at 14:37
  • @Swapnil, it sure does. From https://en.cppreference.com/w/cpp/io/basic_istream/read: end of file condition occurs on the input sequence (in which case, `setstate(failbit|eofbit)` is called). The number of successfully extracted characters can be queried using `gcount()`. – R Sahu May 16 '22 at 02:58
1

You could write a function to:

void readFile( const std::string &fileName, size_t chunk, std::function<void(const std::vector<BYTE>&)> proc )
{
    std::ifstream f( fileName );
    std::vector<BYTE> v(chunk);
    while( f.read( v.data(), v.size() ) ) {
        v.resize( f.gcount() );
        proc( v );
        v.resize( chunk );
    }
}

then usage is simple:

void process( const std::vector<BYTE> &v ) { ... }

readFile( "foobar", 100, process ); // call process for every 100 bytes of data

or you can use lambda etc for callback.

Slava
  • 43,454
  • 1
  • 47
  • 90
0

Or you can write your own function for that:

template<typename Data>
std::istreambuf_iterator<Data> readChunk(std::istreambuf_iterator<Data>& curr, std::vector<Data>& vec, size_t chunk = 100) {
    for (int i = 0; curr != std::istreambuf_iterator<Data>() && i < chunk; ++i, ++curr) {
        vec.emplace_back(*curr);
    }
    return curr;
}

and use it as:

std::ifstream file("test.cpp");
std::vector<BYTE> v;
std::istreambuf_iterator<BYTE> curr(file);
readChunk<BYTE>(curr, v);

And you can call this function again.

Mateusz Wojtczak
  • 1,621
  • 1
  • 12
  • 28