I'm needing to compare data that has been cultivated from various locations, some of which have non-ascii characters, specifically the english letters with accents on them. An example is "Frédérik Gauthier� : -61� : -87� : -61� : -87". When I looked at the int values for the character, I've noticed that these characters are always a combination of 2 "characters" with values of -61 indicating the letter will be accented followed by the letter, in this case -87 for the accented 'e'. My goal is to just "drop" the accent and use the english character. Obviously, I can't rely on this behavior from system to system, so how do you handle this situation? std::string, handles the characters without issue, but as soon as I get to the char level, that's where the issues come up. Any guidance?
#include <iostream>
#include <fstream>
#include <algorithm>
int main(int argc, char** argv){
std::fstream fin;
std::string line;
std::string::iterator it;
bool leave = false;
fin.open(argv[1], std::ios::in);
while(getline(fin, line)){
std::for_each(line.begin(), line.end(), [](char &a){
if(!isascii(a)) {
if(int(a) == -68) a = 'u';
else if(int(a) == -74) a = 'o';
else if(int(a) == -83) a = 'i';
else if(int(a) == -85) a = 'e';
else if(int(a) == -87) a = 'e';
else if(int(a) == -91) a = 'a';
else if(int(a) == -92) a = 'a';
else if(int(a) == -95) a = 'a';
else if(int(a) == -120) a = 'n';
}
});
it = line.begin();
while(it != line.end()){
it = std::find_if(line.begin(), line.end(), [](char &a){ return !isascii(a); });
if(it != line.end()){
line.erase(it);
it = line.begin();
}
}
std::cout << line << std::endl;
std::for_each(line.begin(), line.end(), [&leave](char &a){
if(!isascii(a)) {
std::cout << a << " : " << int(a);
}
});
if(leave){
fin.close();
return 1;
}
}
fin.close();
return 0;
}