2

I am porting over c++ code from linux to windows. I am currently using Visual Studio 2013 to port my code.

I need to read a binary file and am using this portion of c++ code:

// Open the stream
std::ifstream is("myfile.bin");
// Determine the file length
is.seekg(0, std::ios_base::end);
std::size_t size=is.tellg();
is.seekg(0, std::ios_base::begin);
// Create a vector to store the data
int* Data = new int[size/sizeof(int)];
// Load the data
is.read((char*) &Data[0], size);
// Close the file
is.close();

In linux, the size of my binary file is correctly found to be 744mb. However, in windows, the size of my binary file is incorrectly found to be >4GB. How can I correct this issue?

user1431515
  • 185
  • 1
  • 10
  • 1
    Does opening the file with the binary flag make any difference to what `is.tellg();` returns? – Kevin Anderson Jan 05 '17 at 15:48
  • 1
    You might try using the windows API: http://stackoverflow.com/questions/8991192/check-filesize-without-opening-file-in-c#8991228 – Andrew Jan 05 '17 at 15:55
  • 1
    You have a buffer overrun if `size % sizeof(int)` is non-zero. Allocate Data as `new int [(size+sizeof(int)-1)/sizeof(int)];` or simply as `new int[size/sizeof(int) + 1];` Do you see why? – selbie Jan 05 '17 at 17:20

4 Answers4

0

Change std::ifstream is("myfile.bin"); to std::ifstream is("myfile.bin", std::ios::binary);

With your current default open mode, the compiler choses "char" mode. In Linux chars in files are UTF8, first 128 positions are 1-byte char. But for memory UTF32, 4-bytes per char, is used. In Windows chars are "wide-chars", 2-bytes per char.

Ripi2
  • 7,031
  • 1
  • 17
  • 33
  • 2
    You're right that the file should be opened in binary mode. In the current code, the file is opened in **text** mode; there is no "char" mode. While text mode has broad permission to do various implementation-specific things, in fact the only thing that's affected is what byte values in the file are interpreted as line endings. This has nothing to do with the character encoding. In short: remove the second paragraph. – Pete Becker Jan 05 '17 at 16:37
  • 1
    I agree that the file should be opened in binary mode, but that doesn't get to the root problem. The difference between 744 MB and 4 GB cannot be accounted for by CR+LF v. LF. – Adrian McCarthy Jan 05 '17 at 18:32
0

I finally had the time to actually run this myself, though I had to fix a couple of things, like ios_base::beg instead of begin (different function) Also, as mentioned, the array allocation should be this int* Data = new int[size / sizeof(int) + 1]; // At most one extra int

I found your problem: you're not in the right directory. Check if you successfully opened the file or not. If you don't, then you get a huge garbage value (probably -1, but unsigned, so massive) for size.

Try this to find your directory in Windows: (probably need Windows.h or something that I "just had" already)

char dirBuf[256];
GetCurrentDirectory(256, dirBuf);
cout << "Current directory is: " << dirBuf << endl;

See if that's where your file is and move it accordingly. Or specify the ENTIRE path in the constructor to ifstream.

Also, it has nothing to do with ios::binary or not. Works fine both ways, or fails if the file isn't there.

Kevin Anderson
  • 6,850
  • 4
  • 32
  • 54
0
std::size_t size=is.tellg();

The standard doesn't require tellg to return the byte offset from the beginning of the file. In general, this may not be a reliable way to get the size of the file, though it probably does what you expect on Linux and Windows.

The return type of the tellg method is std::basic_stream::pos_type, so you're starting with an implicit conversion to std::size_t which may or may not be appropriate. In a 32-bit build, for example, it's conceivable that the size of a file could be larger than a std::size_t can represent.

But the root problem is that you're not checking for errors. If you have exceptions disabled, then tellg reports an error by returning pos_type(-1). When you cast that to an unsigned type (which std::size_t is), then you get a very large value. I suspect you failed to open the file, and since you didn't detect that error, the seekg and the tellg failed. You then coerced pos_type(-1) to a std::size_t, which made it look like the file was huge.

You also have the problems others have noted: failing to open the file in binary mode and computing the wrong size for the buffer when the file isn't a multiple of the size of an int.

The most reliable to get the file size is to use the OS's API. On Windows, you can do this instead:

// Open the file.  [TODO:  Get the file name in wide characters and use
// CreateFileW instead.  If the file name contains characters not
// representable by the user's ANSI codepage, then CreateFileA will fail.]
HANDLE hfile = CreateFileA("myfile.bin", GENERIC_READ, FILE_SHARE_READ,
                           nullptr, OPEN_EXISTING,
                           FILE_ATTRIBUTE_NORMAL |            FILE_FLAG_SEQUENTIAL_SCAN,
                           nullptr);

if (hfile == INVALID_HANDLE_VALUE)  { error handling here }

// Figure out how big it is.
LARGE_INTEGER li_size;
if (!GetFileSizeEx(hfile, &li_size)) { error handling here }
// TODO:  On a 32-bit build, this won't be able to handle huge files,
// so check that here.

std::size_t size = li_size.QuadPart;

// Create a buffer to store the data, being careful to round up to a
// multiple of sizeof(int).  [TODO:  Use a std::vector instead.]
int* Data = new int[(size + sizeof(int) - 1) / sizeof(int)];

// Load the data.
const DWORD BytesToRead = static_cast<DWORD>(size);
DWORD BytesRead = 0;
if (!ReadFile(hfile, Data, &BytesRead, nullptr) || BytesRead < BytesToRead) {
  error handling here
}

// Close the file
CloseHandle(hfile);
Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
-1
int* Data = new int[size/sizeof(int)];

Why are you doing this? You're dividing the size by 4. You don't want to do this. It should just be int* Data = new int[size]

Also, it should be std::ifstream f("filename.bin", std::ios::binary);

Verideth
  • 7
  • 8
  • 2
    It's an `ifstream` not an `fstream` so `std::ios::in` won't make a difference, though I agree with the binary flag. You're wrong on the division though. He's doing the right thing, as `int` is always (or close enough to always) bigger than a char, which is why he's being very defensive on the size of the array. Though depending on integer division, he needs a `+1` for the number of elements if it's not a multiple of 4 exactly. So right idea, off-by-one-error I think. – Kevin Anderson Jan 05 '17 at 15:51
  • @KevinAnderson My bad, you're right about ifstream. Not sure what you mean about the division though. I'm not great with working with bits and bytes, as my memory allocation ain't the best. but sizeof(int) returns 4, which means hes dividing the size by 4 (but it depends on what system hes on. Mostly its 4). – Verideth Jan 05 '17 at 15:54
  • 5
    The size is returned in bytes, but it's an array of `int`s which are bigger. So let's say your file size is 16. You need 4 `int`s for that, not 16. So that's why he's dividing the file size by the size of `int` It's also why if it was 15, the division would (likely) return 3, not 4, and he'd overwrite by 3 characters. So the expression for the array size should be `new int[size/sizeof(int) + 1]` – Kevin Anderson Jan 05 '17 at 16:10
  • @KevinAnderson Oh okay! thanks so much for the great response! I get it now I _think_ :) – Verideth Jan 05 '17 at 16:12