I'm looking for a way of converting a wstring
into a plain string
containing only ASCII characters. Any character that isn't present in ASCII (0-127) should be converted to the closest ASCII character. If there is no similar ASCII character, the character should be omitted.
To illustrate, let's assume the following wide string:
wstring text(L"A naïve man called 晨 was having piña colada and crème brûlée.");
The converted version I'm looking for is this (notice the absence of diacritics):
string("A naive man called was having pina colada and creme brulee.")
Edit:
Regarding the purpose: I'm writing an application that analyzes English texts. The input files are UTF-8 and may contain special characters. A part of my application uses a library written in C that only understands ASCII. So I need a way of "dumbing down" the text to ASCII without losing too much information.
Regarding the precise requirements: Any character that is a diacritic version of an ASCII character should be converted to that ASCII character; all other characters should be omitted. So ı
, ĩ
, and î
should become i
because they are all versions of the small Latin letter i. The character ɩ
(iota), on the other hand, while visually similar, is not a version of the small Latin letter i and should thus be omitted.