I have found this WideStringToString()
function to convert a Unicode string to an ANSI string. I need to convert a string like àèéìòù
to aeeiou
, so all accents should be removed. I think it could be done with that function, but which codepage should I use?
Asked
Active
Viewed 1,024 times
2

Remy Lebeau
- 555,201
- 31
- 458
- 770

Walter Schrabmair
- 1,251
- 2
- 13
- 26
-
You could perhaps convert from TEncoding.Unicode to TEncoding.ASCII. The latter will most definitely not contain any accents. See the help for [TEncoding](http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding). – Rudy Velthuis Feb 23 '19 at 15:32
-
@RudyVelthuis except that the accents will likely get converted to `?` instead of their ASCII counterparts. `TEncoding` in not good about performing **transliteration** – Remy Lebeau Feb 23 '19 at 18:17
-
@Remy: It seems to work for the accents in my example code below. But obviously not for foreign characters like epsilon. It is not Google Translate, of course. – Rudy Velthuis Feb 23 '19 at 18:18
-
2Maybe this helps: https://stackoverflow.com/questions/1891196/convert-hi-ansi-chars-to-ascii-equivalent-%c3%a9-e – Uli Gerhardt Feb 23 '19 at 18:52
-
@UliGerhardt: note that the accepted answer uses WideCharToMultiByte, which is used by TEncoding too (on Windows). – Rudy Velthuis Feb 23 '19 at 21:22
1 Answers
4
The current way to do this is to use System.SysUtils.TEncoding. An example:
function RemoveAccents(const Src: string): string;
var
Bytes: TBytes;
begin
Bytes := TEncoding.ASCII.GetBytes(Src);
Result := TEncoding.ASCII.GetString(Bytes);
end;
procedure Test;
begin
Writeln(RemoveAccents('Ŧĥε qùíçķ ƀřǭŵņ fôx ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ'));
Writeln(RemoveAccents('àèéìòù'));
end;
For some unknown reason this couldn't convert the epsilon (ε), so the output is:
Th? quick brown fox jump?d over the lazy dog
aeeiou

Rudy Velthuis
- 28,387
- 5
- 46
- 94
-
I tested with NormalizeString and it does not normalize ε either. I looked [here](https://www.unicode.org/charts/beta/normalization/chart_Greek.html) to see if it is expected but I didn't understand anything from that chart. – Sertac Akyuz Feb 23 '19 at 19:49
-
@Sertac: I think it says that epsilon is never composed, i.e. always a single value codepoint. But well, several of these look like an epsilon, so it is pretty confusing.
– Rudy Velthuis Feb 23 '19 at 19:56 -
Thanks a lot for your advices! Epsilon will not occur in my data, so this is a suitable solution! – Walter Schrabmair Feb 24 '19 at 08:01