I'm working on a project where I have to work with special characters.
I am working on windows 10 in the same way I need my solution to work on linux as well, what I need is to read a text file with utf8 encoding, do certain validations and display the text of the file on the screen.
I am working with dev c++ 5.11
I currently have no major problem reading the file with the special characters and displaying it on the console, my problem lies in trying to obtain the special character separately to perform validations.
At the moment the .txt that I am trying to read contains the following information:
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff
the character I'm having trouble with is: ' ó '
I am using the following code:
#include <iostream>
#include <locale.h>
#include<fstream>
#include<string>
#include <windows.h>
#define CP_UTF8 65001
using std::cout;
int main(){
std::ifstream file;
std::string text;
if (!SetConsoleOutputCP(CP_UTF8)) {
std::cerr << "error: unable to set UTF-8 codepage.\n";
return 1;
}
file.open("entryDisciplineESP.txt");
int line = 0;
if (file.fail()){
cout<<"Error. \n";
exit(1);
}
while(std::getline(file,text)){
if(line == 2){
cout<<text[5]<<"\n";
}
std::cout<<text<<"\n";
line++;
}
cout<<"\n";
system("Pause");
return 0;
}
I am getting the following from the console:
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff
my problem is that when I try to print the character ' ó ' separately it does not do it, on the contrary it is printing a blank space and I need to work with that character to be able to do validations for example, I need to verify that there are no numbers or other types in that text of characters such as '?', besides that I would like to do other things to facilitate the work.
How can I achieve what I need? I have read about converting that text from utf8 to utf16 but I haven't achieved that successfully and I don't know if it works, any suggestions?
I appreciate all help in advance.
EDIT 1.
Seeing that the general recommendation is to convert from utf-8 to utf-32 to do the validation work, I have managed to implement the #include <codecvt>
library, now using dev c++ 6.3, implement the following recommended function for testing:
std::wstring utf8_to_ws(std::string const& utf8)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> cnv;
std::wstring s = cnv.from_bytes(utf8);
if(cnv.converted() < utf8.size())
throw std::runtime_error("incomplete conversion");
return s;
}
Now in the conditional I have updated and I am calling the function.
if(line == 2){
std::cout<<text[5]<<"\n";
std::wstring a = utf8_to_ws(text);
std::wcout<<a<<"\n";
}
and now I am getting the following output in the console:
Inicio
D1
Biatln
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff
for some reason it keeps omitting the ' ó ' character, I appreciate help to solve this problem.