I'm having a hard time to parse an xml file.
The file was saved with UTF-8 Encoding.
Normal ASCII are read correctly, but Korean characters are not.
So I made a simple program to read a UTF-8 text file and print the content.
Text File(test.txt)
ABC가나다
Test Program
#include <fstream>
#include <iostream>
#include <string>
#include <iterator>
#include <streambuf>
const char* hex(char c) {
const char REF[] = "0123456789ABCDEF";
static char output[3] = "XX";
output[0] = REF[0x0f & c>>4];
output[1] = REF[0x0f & c];
return output;
}
int main() {
std::cout << "File(ifstream) : ";
std::ifstream file("test.txt");
std::string buffer((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
for (auto c : buffer) {
std::cout << hex(c)<< " ";
}
std::cout << std::endl;
std::cout << buffer << std::endl;
//String literal
std::string str = "ABC가나다";
std::cout << "String literal : ";
for (auto c : str) {
std::cout << hex(c) << " ";
}
std::cout << std::endl;
std::cout << str << std::endl;
return 0;
}
Output
File(ifstream) : 41 42 43 EA B0 80 EB 82 98 EB 8B A4
ABC媛?섎떎
String literal : 41 42 43 B0 A1 B3 AA B4 D9
ABC가나다
The output said that characters are encoded differently in string literal and file.
So far as I know, in c++ char
strings are encoded in UTF-8 so we can see them through printf
or cout
. So their bytes were supposed to be same, but they were different actually...
Is there any way to read UTF-8 text file using std::ifstream
?
I succeed to parse xml file using std::wifstream
following this article.
But most of the libraries I'm using are supporting only const char*
string so I'm searching for another way to use std::ifstream
.
And also I've read this article saying that do not use wchar_t
. Treating char
string as multi-bytes character is sufficient.