1

I have pixels from an image which are stored in a binary file.

I would like to use a function to quickly read this file.

For the moment I have this:

std::vector<int> _data;    

std::ifstream file(_rgbFile.string(), std::ios_base::binary);
while (!file.eof())
{
    char singleByte[1];
    file.read(singleByte, 1);
    int b = singleByte[0];
    _data.push_back(b);
}
std::cout << "end" << std::endl;
file.close();

But on 4096 * 4096 * 3 images it already takes a little time.

Is it possible to optimize this function?

Fedour Traktor
  • 307
  • 1
  • 2
  • 16
  • 3
    I wouldn't store bytes as int (which is at least 4 bytes long). Just allocate a `char` array (or std::vector, but call `reserve` before) with the size of data in the file (~50MB is not a big deal) and populate it in a single read. Then, you can do some processing of that array. – pptaszni Oct 29 '21 at 13:16
  • 2
    `int` isn't specific enough since it can have different sizes on different platforms. Your loop is also wrong. Read this: [Why is `iostream::eof()` inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) – Ted Lyngmo Oct 29 '21 at 13:20
  • 1
    You are reading one byte at a time and then making `int`s out of them. I doubt that' you've interpreted the file format correctly. What's stored in the file? RGBA? – Ted Lyngmo Oct 29 '21 at 13:22
  • 1
    why are readin byte by byte couldn't you load all the file in a dynamic array or line by line. – Anis Belaid Oct 29 '21 at 13:23
  • 1
    @pptaszni An `int` is at least 16 bits wide, not 4 bytes. – Ted Lyngmo Oct 29 '21 at 13:25
  • Files contain RGB tuples. I know that using `int` is bad in this scenario. I was using `uint8_t` at first but I had some bugs with it. – Fedour Traktor Oct 29 '21 at 14:47

1 Answers1

2

You could make this faster by reading the whole file in one go, and preallocating the necessary storage in the vector beforehand:

std::ifstream file(_rgbFile.string(), std::ios_base::binary);
std::streampos posStart = file.tellg();
file.seekg(0, std::ios::end);
std::streampos posEnd = file.tellg();
file.seekg(posStart);

std::vector<char> _data;
_data.resize(posEnd - posStart, 0);
file.read(&_data[0], posEnd - posStart);
std::cout << "end" << std::endl;
file.close();

Avoiding unnecessary i/o

By reading the file as a whole in one read() call you can avoid a lot of read calls, and buffering of the ifstream. If the file is very large and you don't want to load it all in memory at once, then you can load smaller chunks of maybe a few MB each.

Also you avoid lots of functions calls - by reading it byte-by-byte you need to issue ifstream::read 50'331'648 times!

vector preallocation

std::vector grows dynamically when you try to insert new elements but no space is left. Each time the vector resizes, it needs to allocate a new, larger, memory area and copy all current elements in the vector over to the new location.
Most vector implementions choose a growth factor between 1.5 - 2, so each time the vector needs to resize it'll be a 1.5-2x larger allocation.

This can be completely avoided by calling std::vector::reserve or std::vector::resize. With these functions the vector memory only needs to be allocated once, with at least as many elements as you requested.

Godbolt example

Here's a godbolt example that shows the performance improvement.

testing a ~5MB file (4096*4096*3 bytes)

  • gcc 11.2, with optimizations disabled:
Old New
1300ms 16ms
  • gcc 11.2, -O3
Old New
878ms 13ms

Small bug in the code

As @TedLyngmo has pointed out your code also contains a small bug.
The EOF marker will only be set once you tried to read past the end of the file. see this question

So the last read that sets the EOF bit didn't actually read a byte, so you have one more byte in your array that contains uninitialized garbage.

You could fix this by checking for EOF directly after the read:

while(true) {
    char singleByte[1];
    file.read(singleByte, 1);
    if(file.eof()) break;
    int b = singleByte[0];
    _data.push_back(b);
}
Turtlefight
  • 9,420
  • 2
  • 23
  • 40