1

I'm trying to read file, which contains Cyrillic characters in their path, and got ifstream.is_open() == false This is my code:

std::string ReadFile(const std::string &path) {
    std::string newLine, fileContent;
    std::ifstream in(path.c_str(), std::ios::in);

    if (!in.is_open()) {
        return std::string("isn't opened");
    }

    while (in.good()) {
        getline(in, newLine);
        fileContent += newLine;
    }

    in.close();

    return fileContent;
}

int main() {
    std::string path = "C:\\test\\документ.txt";
    std::string content = ReadFile(path);
    std::cout << content << std::endl;
    return 0;
}

Specified file exists

I'm trying to find solution in google, but I got nothing

Here is links, which I saw:

I don't need wstring

The same as previous

no answer here

is not about C++

has no answer too

P.S. I need to get file's content in string, not in wstring

THIS IS ENCODING SETTINGS OF MY IDE (CLION 2017.1)

My encoding

Community
  • 1
  • 1
V. Panchenko
  • 774
  • 1
  • 10
  • 32
  • Maybe you might check `errno` – Basile Starynkevitch Apr 14 '17 at 06:02
  • 3
    If you are sure that the file actually exists, you probably have a file name encoding issue. `std::ifstream` constructor will accept either a `const char *` encoded in the system code page, or a `const wchar_t *`. Since you are sending a `const char *`, it seems that the string `"C:\\test\\документ.txt"` stored by your IDE is *not* encoded in the system code page. (Perhaps it's UTF-8?) You need to configure your IDE to use the system code page in source code, or figure out the encoding and convert to either the system code page or UTF-16. – user4815162342 Apr 14 '17 at 06:03
  • @BasileStarynkevitch, errno is equals to "2" (inside `if(!in.is_open())`) – V. Panchenko Apr 14 '17 at 06:06
  • I even didn't think about it, @user4815162342 ! Thank you, I will try to do it now! – V. Panchenko Apr 14 '17 at 06:07
  • @user4815162342, I found encoding settings of my IDE and update question. Take a look, please. – V. Panchenko Apr 14 '17 at 06:16
  • What do you get if you print out the contents of the string, e.g.: `std::string path = "C:\\test\\документ.txt"; for (size_t i = 0; i < path.size(); i++) printf("%02x ", path[i]);` – user4815162342 Apr 14 '17 at 12:58
  • Are you *sure* that the characters in the filename are all available in the system-default ANSI code page? And come to think of it I don't actually know whether `ifstream` uses the system-default ANSI code page for the filename or the C runtime locale. Might even be the multibyte code page for all I know! I think you should at least *try* converting the filename to a wide-character string and opening the file with that. – Harry Johnston Apr 14 '17 at 23:02
  • ... and that way, you can print out the UTF-16 code points in the string and compare to the [UTF-16 code points in the actual file name](https://superuser.com/q/1199536/96662). – Harry Johnston Apr 14 '17 at 23:27

4 Answers4

1

You'll need an up-to-date compiler or Boost. std::filesystem::path can handle these names, but it's new in the C++17 standard. Your compiler may still have it as std::experimental::filesystem::path, or else you'd use the third-party boost::filesystem::path. The interfaces are pretty comparable as the Boost version served as the inspiration.

MSalters
  • 173,980
  • 10
  • 155
  • 350
0

The definition for std::string is std::basic_string, so your Cyrillic chararecters are not stored as intended. Atleast, try to use std::wstring to store your file path and then you can read from file using std::string.

PVRT
  • 480
  • 4
  • 13
0

First of all, set your project settings to use UTF-8 encoding instead of windows-1251. Until standard library gets really good (not any time soon) you basically can not rely on it if you want to deal with io properly. To make input stream read from files on Windows you need to write your own custom input stream buffer that opens files using 2-byte wide chars or rely on some third-party implementations of such routines. Here is some incomplete (but sufficient for your example) implementation:

// assuming that usual Windows SDK macros such as _UNICODE, WIN32_LEAN_AND_MEAN are defined above
#include <Windows.h>

#include <string>
#include <iostream>
#include <system_error>
#include <memory>
#include <utility>
#include <cstdlib>
#include <cstdio>

static_assert(2 == sizeof(wchar_t), "wchar_t size must be 2 bytes");

using namespace ::std;

class MyStreamBuf final: public streambuf
{
    #pragma region Fields
    private: ::HANDLE const  m_file_handle;
    private: char            m_buffer; // typically buffer should be much bigger
    #pragma endregion

    public: explicit
    MyStreamBuf(wchar_t const * psz_file_path)
    :   m_file_handle(::CreateFileW(psz_file_path, FILE_GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL))
    ,   m_buffer{}
    {
        if(INVALID_HANDLE_VALUE == m_file_handle)
        {
            auto const error_code{::GetLastError()};
            throw(system_error(static_cast< int >(error_code), system_category(), "::CreateFileW call failed"));
        }
    }

    public:
    ~MyStreamBuf(void)
    {
        auto const closed{::CloseHandle(m_file_handle)};
        if(FALSE == closed)
        {
            auto const error_code{::GetLastError()};
            //throw(::std::system_error(static_cast< int >(error_code), system_category(), "::CloseHandle call failed"));
            // throwing in destructor is kinda wrong
            // but if CloseHandle returned false then our program is in inconsistent state
            // and must be terminated anyway
            (void) error_code; // not used
            abort();
        }
    }

    private: auto
    underflow(void) -> int_type override
    {
        ::DWORD bytes_count_to_read{1};
        ::DWORD read_bytes_count{};
        {
            auto const succeeded{::ReadFile(m_file_handle, addressof(m_buffer), bytes_count_to_read, addressof(read_bytes_count), nullptr)};
            if(FALSE == succeeded)
            {
                auto const error_code{::GetLastError()};
                setg(nullptr, nullptr, nullptr);
                throw(system_error(static_cast< int >(error_code), system_category(), "::ReadFile call failed"));
            }
        }
        if(0 == read_bytes_count)
        {
            setg(nullptr, nullptr, nullptr);
            return(EOF);
        }
        setg(addressof(m_buffer), addressof(m_buffer), addressof(m_buffer) + 1);
        return(m_buffer);
    }
};

string
MyReadFile(wchar_t const * psz_file_path)
{
    istream in(new MyStreamBuf(psz_file_path)); // note that we create normal stream
    string new_line;
    string file_content;
    while(in.good())
    {
        getline(in, new_line);
        file_content += new_line;
    }
    return(::std::move(file_content));
}

int
main(void)
{
    string content = MyReadFile(L"C:\\test\\документ.txt"); // note that path is a wide string
    cout << content << endl;
    return 0;
}
user7860670
  • 35,849
  • 4
  • 58
  • 84
  • Is all of this really necessary? AFAIK Windows has `std::istream` constructor overloaded for `const wchar_t *` which should be able to open `L"C:\\test\\документ.txt"` just fine. – user4815162342 Apr 14 '17 at 11:22
  • This overload is not standard and seems to be unavailable for OP. But at some point one have to write custom streambufs anyway to gain desired level of control over io or even get rid of standard streams altogether. – user7860670 Apr 14 '17 at 11:39
  • The OP is not using a wide string for file name, so the overload didn't get a chance to be used in the first place. The overload being "non-standard" is a non-issue, since `CreateFileW` is equally non-standard, as well including `` etc. – user4815162342 Apr 14 '17 at 12:09
  • I assume that this extension is not available since OP isn't using MSVS. Your point about `CreateFileW ` being equally non-standard etc. is not valid because writing custom streambuffers using standard interface is a preferred way to add desired functionality into iostreams compared to inserting new method into guts of standard library. – user7860670 Apr 14 '17 at 12:25
  • Neither `CreateFileW` not `istream::istream(const wchar_t *)` are specified by standard C++. Writing a custom stream buffer is a valid solution for some use cases, but is likely an overkill for what the OP actually needs, which is to read the contents of a regular file. – user4815162342 Apr 14 '17 at 12:54
  • Even though both of them are not part of the standard we are free to use CreateFileW or any other platform-specific or third-party library while we can not just paste new iostream constructor into standard library. I kinda agree that writing own stream buf seems to be an overkill but that is what usually happens when one tries to cross standard library boundary just for a bit while trying to stay inside at the same time. – user7860670 Apr 14 '17 at 14:56
  • The `iostream` constructor that accepts `wchar_t *` is [provided by Windows](https://msdn.microsoft.com/en-us/library/k7hz8258.aspx#basic_ifstream__basic_ifstream), nothing is "pasted into standard library". – user4815162342 Apr 14 '17 at 15:15
  • More precisely it is provided as part of MS implementation of C++ standard library, not by Windows, (unlike CreateFileW which is part of Windows itself). – user7860670 Apr 14 '17 at 15:41
0

Change your code to use wstring and save your file using Unicode encoding (non UTF8 one, use USC-2, UTF16 or something like that). MSVC has non-standard overload specifically for this reason to be able to handle non-ascii chars in filenames:

std::string ReadFile(const std::wstring &path)
{
    std::string newLine, fileContent;
    std::ifstream in(path.c_str(), std::ios::in);

    if (!in)
        return std::string("isn't opened");

    while (getline(in, newLine))
        fileContent += newLine;

    return fileContent;
}

int main()
{
    std::wstring path = L"C:\\test\\документ.txt";
    std::string content = ReadFile(path);
    std::cout << content << std::endl;
}

Also, note corrected ReadFile code.

Pavel P
  • 15,789
  • 11
  • 79
  • 128