I have a large file, about 4GB. I need to read a portion of that file that is bigger than INT32_MAX. std::ifstream::read()
accepts a std::streamsize
as second input (number of bytes to read). Since I'm on 64 bit, that is a typedef
to ptrdiff_t
which should be a int64_t
.
So I would expect to be able to read 9223372036854775807 at once. My example below proves me wrong. The failbit is set when I read more than INT32_MAX.
What am I missing?
#include <fstream>
#include <iostream>
#include <limits>
int main() {
std::cout << "Maximum of std::streamsize: "
<< std::numeric_limits<std::streamsize>::max() << std::endl;
std::cout << "INT32_MAX: " << std::numeric_limits<int32_t>::max()
<< std::endl;
auto const filename = R"(C:\TEMP\test.dat)"; // a large file > INT32_MAX
auto dataStream = std::ifstream();
dataStream.open(filename, std::ios_base::binary);
dataStream.seekg(0, dataStream.end);
size_t filesize = dataStream.tellg();
std::cout << "Size of file: " << filesize << std::endl;
// buffer for the whole file
auto buffer = new uint8_t[filesize];
dataStream.seekg(0, dataStream.beg);
std::cout << "Reading INT32_MAX bytes..." << std::endl;
dataStream.read(reinterpret_cast<char*>(buffer),
std::numeric_limits<int32_t>::max());
std::cout << "Read failed: " << dataStream.fail() << std::endl;
dataStream.seekg(0, dataStream.beg);
std::cout << "Reading INT32_MAX + 1 bytes..." << std::endl;
dataStream.read(
reinterpret_cast<char*>(buffer),
static_cast<int64_t>(std::numeric_limits<int32_t>::max()) + 1);
std::cout << "Read failed: " << dataStream.fail() << std::endl;
delete[] buffer;
}
I compiled with:
$ g++ --version
g++.exe (Rev2, Built by MSYS2 project) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
On a Windows 7 Laptop gives:
Maximum of std::streamsize: 9223372036854775807
INT32_MAX: 2147483647
Size of file: 4001202702
Reading INT32_MAX bytes...
Read failed: 0
Reading INT32_MAX + 1 bytes...
Read failed: 1
I worked around this issue by reading multiple INT32_MAX sized chunks. I'm interested to know why this failed though.
EDIT: I did some more testing. compiled on Linux with GCC 8.3: works, compiled with MSVC15: works.
So I guess there's a problem with the libstdc++ that comes with MinGW-W64.
EDIT2: Problem still exists with
$ gcc --version
gcc.exe (Rev8, Built by MSYS2 project) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.