1

I have a large file, about 4GB. I need to read a portion of that file that is bigger than INT32_MAX. std::ifstream::read() accepts a std::streamsize as second input (number of bytes to read). Since I'm on 64 bit, that is a typedef to ptrdiff_t which should be a int64_t. So I would expect to be able to read 9223372036854775807 at once. My example below proves me wrong. The failbit is set when I read more than INT32_MAX.

What am I missing?

#include <fstream>
#include <iostream>
#include <limits>

int main() {
  std::cout << "Maximum of std::streamsize: "
            << std::numeric_limits<std::streamsize>::max() << std::endl;
  std::cout << "INT32_MAX: " << std::numeric_limits<int32_t>::max()
            << std::endl;

  auto const filename = R"(C:\TEMP\test.dat)";  // a large file > INT32_MAX

  auto dataStream = std::ifstream();
  dataStream.open(filename, std::ios_base::binary);
  dataStream.seekg(0, dataStream.end);
  size_t filesize = dataStream.tellg();
  std::cout << "Size of file: " << filesize << std::endl;

  // buffer for the whole file
  auto buffer = new uint8_t[filesize];

  dataStream.seekg(0, dataStream.beg);
  std::cout << "Reading INT32_MAX bytes..." << std::endl;
  dataStream.read(reinterpret_cast<char*>(buffer),
                  std::numeric_limits<int32_t>::max());
  std::cout << "Read failed: " << dataStream.fail() << std::endl;

  dataStream.seekg(0, dataStream.beg);
  std::cout << "Reading INT32_MAX + 1 bytes..." << std::endl;
  dataStream.read(
      reinterpret_cast<char*>(buffer),
      static_cast<int64_t>(std::numeric_limits<int32_t>::max()) + 1);
  std::cout << "Read failed: " << dataStream.fail() << std::endl;

  delete[] buffer;
}

I compiled with:

$ g++ --version
g++.exe (Rev2, Built by MSYS2 project) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

On a Windows 7 Laptop gives:

Maximum of std::streamsize: 9223372036854775807
INT32_MAX: 2147483647
Size of file: 4001202702
Reading INT32_MAX bytes...
Read failed: 0
Reading INT32_MAX + 1 bytes...
Read failed: 1

I worked around this issue by reading multiple INT32_MAX sized chunks. I'm interested to know why this failed though.

EDIT: I did some more testing. compiled on Linux with GCC 8.3: works, compiled with MSVC15: works.

So I guess there's a problem with the libstdc++ that comes with MinGW-W64.

EDIT2: Problem still exists with

$ gcc --version
gcc.exe (Rev8, Built by MSYS2 project) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
j-hap
  • 150
  • 1
  • 2
  • 9
  • Are you running out of memory? Does the OS have that much memory to give to your task? – Thomas Matthews Nov 25 '19 at 16:50
  • I can read INT32_MAX chunks multiple times and there is RAM available left. So I'd say no running out of memory. – j-hap Nov 26 '19 at 06:38
  • 1
    Looks like https://stackoverflow.com/questions/16324811/ifstream-what-is-the-maximum-file-size-that-a-ifstream-can-read – El Gohr Dec 05 '19 at 16:39
  • that's for visual studio and only tries to estimate the filesize, with which I have no problem. – j-hap Dec 06 '19 at 08:25

1 Answers1

0

can you try this

auto buffer = new int32_t[filesize];
ΦXocę 웃 Пepeúpa ツ
  • 47,427
  • 17
  • 69
  • 97
Ehab
  • 1
  • 1
    Welcome to Stack Overflow! Thank you for this code snippet, which might provide some limited short-term help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its long-term value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – Toby Speight Dec 05 '19 at 16:21
  • i guess you wanted to see if the array was created large enough or has a 2gb limit and the read operation failed because of that. I tried it with int32_t and it failed with the same error. – j-hap Dec 06 '19 at 08:53