6

Some string that I am getting is UTF-8 encoded, and contains some special characters like Å¡, Ä‘, Ä etc. I am using StringReplace() to convert it to some normal text, but I can only convert one type of character. Because PHP also has a function to replace strings as seen here: how to replace special characters with the ones they're based on in PHP?, but it supports arrays:

<?php
  $vOriginalString = "¿Dónde está el niño que vive aquí? En el témpano o en el iglú. ÁFRICA, MÉXICO, ÍNDICE, CANCIÓN y NÚMERO.";

  $vSomeSpecialChars = array("á", "é", "í", "ó", "ú", "Á", "É", "Í", "Ó", "Ú", "ñ", "Ñ");
  $vReplacementChars = array("a", "e", "i", "o", "u", "A", "E", "I", "O", "U", "n", "N");

  $vReplacedString = str_replace($vSomeSpecialChars, $vReplacementChars, $vOriginalString);

  echo $vReplacedString; // outputs '¿Donde esta el nino que vive aqui? En el tempano o en el iglu. AFRICA, MEXICO, INDICE, CANCION y NUMERO.'
?>

How can I do this in Delphi? StringReplace doesn't support arrays.

Community
  • 1
  • 1
Thalvik
  • 63
  • 1
  • 3
  • The string is UTF-8 encoded **and** contains "special characters"? What's a "special character"? Check out [this answer](http://stackoverflow.com/questions/6552477/replace-worldwide-diacritics-characters/6552564#6552564) too -- if you have access to `iconv`. – Kerrek SB Jul 06 '11 at 17:57
  • If you want this for comparison, then use [CompareString](http://msdn.microsoft.com/en-us/library/dd317759(v=vs.85).aspx) with at least `NORM_IGNORENONSPACE` in `dwCmpFlags`. – NGLN Jul 06 '11 at 20:45

2 Answers2

6
function str_replace(const oldChars, newChars: array of Char; const str: string): string;
var
  i: Integer;
begin
  Assert(Length(oldChars)=Length(newChars));
  Result := str;
  for i := 0 to high(oldChars) do
    Result := StringReplace(Result, oldChars[i], newChars[i], [rfReplaceAll])
end;

If you are concerned about all the needless heap allocations caused by StringReplace then you could write it this way:

function str_replace(const oldChars, newChars: array of Char; const str: string): string;
var
  i, j: Integer;
begin
  Assert(Length(oldChars)=Length(newChars));
  Result := str;
  for i := 1 to Length(Result) do
    for j := 0 to high(oldChars) do
      if Result[i]=oldChars[j] then
      begin
        Result[i] := newChars[j];
        break;
      end;
end;

Call it like this:

newStr := str_replace(
  ['á','é','í'],
  ['a','e','i'], 
  oldStr
);
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • @Thalvik To save you some work, [here's](http://www.nldelphi.com/Forum/showthread.php?p=295849#post295849) a complete array (among others I'm sure). – NGLN Jul 06 '11 at 20:50
6

Getting rid of your accents is called Normalization.

Since you are using Unicode, you are not only wanting to normalize the short list of accented characters in your question. In fact you are looking for Unicode Normalization Form D (NFD) or KD (NFKD), which you can do in Windows and of course in Delphi.

This answer should get you going on the theoretical side.

This Delphi code and this answer should get you going implementing.

Community
  • 1
  • 1
Jeroen Wiert Pluimers
  • 23,965
  • 9
  • 74
  • 154
  • This sounds like the right approach. I just naively answered the question as asked. – David Heffernan Jul 06 '11 at 17:58
  • Sorry, "getting rid of accents" is **not** normalization -- it's just getting rid of accents! Normalization doesn't change the semantics of the character, it just chooses between "base plus diacritic" and "legacy Latin-1" form (and some other forms if appropriate) in a consistent fashion so that two normalized strings compare equal if they're semantically equal. The OP's goal appears to be *transliteration* to ASCII-only characters. – Kerrek SB Jul 06 '11 at 18:00
  • @Kerrek: I inferred Normalization because the OP links to the PHP solution mentioning Normalization. – Jeroen Wiert Pluimers Jul 07 '11 at 19:39
  • 1
    @Jeroen: Even the linked accepted SO answer is wrong, as pointed out in its comments; the [PHP normalize function](http://www.php.net/manual/en/normalizer.normalize.php) does exactly what the Unicode standard says and what I said. It does *not* transliterate ä to a! – Kerrek SB Jul 07 '11 at 19:42
  • @Jeroen: Well, I would prefer `iconv`ing to ASCII//TRANSLIT and then regexing `\w` out, I think that's a bit simpler and more foolproof... – Kerrek SB Jul 07 '11 at 19:48
  • @Kerrek: there is no iconv in Delphi. – Jeroen Wiert Pluimers Jul 07 '11 at 20:53
  • @Jeroen: Shame. No way to pull in a C library? Oh well, in that case you'll have to roll your own transliterator, indeed. – Kerrek SB Jul 07 '11 at 21:01
  • @Kerrek: importing LIB files in Delphi is a pain; OBJ files can be done, but is hard. The easiest is to import DLLs. Same BTW is in .NET: importing OBJ/LIB there is virtually impossible too. – Jeroen Wiert Pluimers Jul 08 '11 at 06:44