1

I'm in need of reading both UTF-8 string (std::string) and UTF-16 string (std::u16string) from a file (opened with ifstream).

The UTF-8 string is easy, I think I can just use something like std::getline(stream, str, '\0').

But about UTF-16, I'm not sure how I can actually read it. I know I can maybe loop in the file and read 2 bytes each time until a 0x0000 byte, but I'm not sure if that is the right and best way to do it.

So, how can I read it?

-- edit --

For now, I'm doing it this way, is this ok?

std::string binaryReader::ru16str_n()
{
    std::u16string str;
    char16_t ch = 0;
    while (true)
    {
        binary.read(reinterpret_cast<char*>(&ch), 2);
        if (ch != '\0')
            str.push_back(ch);
        else break;
    }
    return std::wstring_convert<
        std::codecvt_utf8_utf16<char16_t>, char16_t>{}.to_bytes(str);
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
file-tracer
  • 329
  • 1
  • 7
  • 2
    From https://en.cppreference.com/w/cpp/string/basic_string/getline it seems like `getline` can work with any character type `CharT`, so I think a u16 string getline call would simply be `std::getline(stream, str, u'\0')`. – mediocrevegetable1 Nov 05 '21 at 09:58
  • 2
    Are you sure the data in the file is null terminated? This would be rather unusual. – Codo Nov 05 '21 at 09:59
  • 2
    Related: https://stackoverflow.com/q/50696864 – kiner_shah Nov 05 '21 at 09:59
  • 1
    @mediocrevegetable1 I tested getline with a u16string, and its said `no instance of overloaded function "std::getline" matches the argument list` so seems like its cant read u16strings... – file-tracer Nov 05 '21 at 10:02
  • 1
    @Codo yes it is, u16string as null terminated string... – file-tracer Nov 05 '21 at 10:03
  • 1
    @kiner_shah its more about reading the u16string that we know the size of it, not null-terminated. – file-tracer Nov 05 '21 at 10:04
  • 1
    @file-tracer maybe your stream isn't opened as a `char16_t` stream (`std::basic_ifstream`)? If not then I'm not sure :/ – mediocrevegetable1 Nov 05 '21 at 10:09
  • @mediocrevegetable1 yes, I opened it as `char` because I need to read first part of stream as char... should I create a temp file and write second part to it and then read it as char16_t? or there is better ways? (maybe just read it all to a buffer and then work with it? – file-tracer Nov 05 '21 at 10:19
  • 1
    You could create one stream for reading to the `std::string`, close it and create another for reading into `std::u16string`. If you want to read through the file once then you can maybe read everything till a `\0` into a byte buffer and interpret it as an `std::string` or `std::u16string`. – mediocrevegetable1 Nov 05 '21 at 10:26
  • 2
    @file-tracer I am aware that `std::u16string` is null terminated. It's a data type in main memory. But that does mean it's the same in the file. Are you sure the strings in the file are null terminated? What kind of file format is it? – Codo Nov 05 '21 at 10:39
  • @mediocrevegetable1 yes I can do that, its also probably remove the need of using the wifstream and open and close file... but does it good in performance side? – file-tracer Nov 05 '21 at 10:56
  • @Codo yes Im sure, the file is a custom binary file that I need to read, the structure is, at start there is a table of offsets to where each u16strings stored (start after the offset table itself), and after that there are strings. – file-tracer Nov 05 '21 at 10:59
  • @mediocrevegetable1 I added an example? is it ok? – file-tracer Nov 05 '21 at 11:26

0 Answers0