0

I have the following simple code to read a file:

std::basic_ifstream<wchar_t> RFile(L"C:\\file.exe", std::ios::binary|std::ios::ate);
if (!RFile.is_open()){ cout << "Cannot open the file." << endl; return 0;}
std::streamoff fileSize = RFile.tellg();
wstring fileContent;
fileContent.reserve(fileSize);
RFile.seekg(0, std::ios::beg);
if (!RFile.read(&fileContent[0], fileSize)) cout << "An error when reading the file." << endl;
RFile.close();

There are no errors appear while compiling or runtime too but there is unknown behavior at runtime/debugging, the program doesn't end and still waiting (something like waiting for inputs).

Is there some wrong in my code?


EDIT

The program finally ended and completed its work, however, I noted:

  • The program takes almost 32 seconds to just read 17 MB, is that normal or there is something in my code (I think that is very slow)?
  • Also when using char data type instead wchar_t the reading process became fast as it should be, so, is the problem in wchar_t data type or what?
Lion King
  • 32,851
  • 25
  • 81
  • 143
  • 1
    Unrelated: After `if (!RFile.is_open()) cout << "Cannot open the file." << endl;` you should probably not allow the program to continue as though the file was opened. – user4581301 Jul 07 '20 at 23:05
  • 1
    Where is it waiting? What does your debugger say? What is the call stack? – Asteroids With Wings Jul 07 '20 at 23:06
  • When you debug the program, where does it stop and wait? Usually there's good intel to be gained from inspecting the site of lock-up. – user4581301 Jul 07 '20 at 23:06
  • Is `tellg` going to give you the size of the file in `wchar_t`s, in good old bytes, or something else? This is going to be important to know. – user4581301 Jul 07 '20 at 23:08
  • 2
    Probably unrelated, but I think you want `fileContent.resize(fileSize)` rather than `fileContent.reserve(fileSize)`. – Paul Sanders Jul 07 '20 at 23:08
  • @AsteroidsWithWings: At `read` function, and doesn't say anything just waiting. – Lion King Jul 07 '20 at 23:14
  • @user4581301: Yes, `tellg` gives me the correct size. – Lion King Jul 07 '20 at 23:19
  • @PaulSanders: The same thing happens with `resize` function. – Lion King Jul 07 '20 at 23:21
  • Correct size of the file or correct size of the string? – user4581301 Jul 07 '20 at 23:30
  • @user4581301: The correct size of the file. The program finally ended and completed its work however, I noted that the program takes almost 32 seconds to just read 17 MB, is that normal? – Lion King Jul 07 '20 at 23:37
  • Print `fileSize`. Is it correct? – Asteroids With Wings Jul 07 '20 at 23:41
  • No Idea what the rest of the program is doing. If it just reads, 32 seconds is a long time for 17 MB. If it's reading 17MB and computing Travelling Salesman... – user4581301 Jul 07 '20 at 23:42
  • Sehe just made my point about the difference between the size of the file and the size of the string. – user4581301 Jul 07 '20 at 23:43
  • It's a `wstring`. A string of `wchar_t`. Same base type as the stream. – Asteroids With Wings Jul 07 '20 at 23:43
  • _"the program takes almost 32 seconds to just read 17 MB, is that normal"_ No. Could the file have been read-locked until a few moments ago? It's unlikely that we'll be able to diagnose this for you from here. – Asteroids With Wings Jul 07 '20 at 23:44
  • Or technically it _could_ be the UB that Paul noted. Though in practice I think that's unlikely. – Asteroids With Wings Jul 07 '20 at 23:45
  • @AsteroidsWithWings: Yes, it prints `18850624` and the real size is `17.9MB`. – Lion King Jul 07 '20 at 23:49
  • W.r.t the time taken, this is often due to debug builds. Try a release build and possibly disable debug iterators if your standard library has them – sehe Jul 08 '20 at 00:04
  • @LionKing Then that's interesting, because it tells us the cursor position is given in bytes, which is _twice as many_ as the `wchar_t`s you're trying to receive. So no wonder it blocks. sehe was right. Strange, though, as that's not how it's _supposed_ to work. – Asteroids With Wings Jul 08 '20 at 00:17
  • @AsteroidsWithWings: I just want to read Unicode UTF-16 files correctly not just ANSI. – Lion King Jul 08 '20 at 00:58
  • I think I'd just use a normal `fstream`. You can read `char`s (bytes) into a `wstring` with some casting. Then life will be a bit simpler. But make sure your `wstring` is two-byte on your platform (that's true on Windows, which treats them as "always UTF-16", weirdly, but not elsewhere) – Asteroids With Wings Jul 08 '20 at 01:05

1 Answers1

-1

You're reading wstring. The size of wchar_t on your system is probably not a byte. Sizing from the filesize in bytes is not correct then.

I would use a more idiomatic approach instead of doing the manual chores:

#include <fstream>
#include <string>
#include <iostream>

int main() {
    std::wifstream file;
    try {
        file.exceptions(std::ios::failbit | std::ios::badbit);
        file.open("C:\\file.exe", std::ios::binary);

        std::wstring const content(
                std::istreambuf_iterator<wchar_t>(file), {});

        std::cout << "Read " << content.size() << " characters\n";
    } catch(std::exception const& e) {
        std::wcout << "error reading file: " << e.what() << "\n";
    }
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Do you have reason to believe that the get cursor position is "incorrect" in some way for non-`char` streams? Could you provide more information regarding this phenomenon? – Asteroids With Wings Jul 07 '20 at 23:42
  • Mmm. Perhaps tellg() [should be correct](https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/testsuite/27_io/basic_istream/tellg/wchar_t/26211.cc) in terms of wchar_t units. However, I do know that I donot need to know, and the only other place where I spotted this code was ironically another person who [had trouble with it](https://stackoverflow.com/q/40844876/85371). I'm not sure I'm willing to pour more time into finding out more, as details will vary by platform as well. – sehe Jul 08 '20 at 00:03