Unicode to ASCII with character translations for umlats

Question

I have a client that sends unicode input files and demands only ASCII encoded files in return - why is unimportant.

Does anyone know of a routine to translate unicode string to a closest approximation of an ASCII string? I'm looking to replace common unicode characters like 'ä' to a best ASCII representation.

For example: 'ä' -> 'a'

Data resides in SQL Server however I can also work in C# as a downstream mechanism or as a CLR procedure.

possible duplicate of [How to remove accents and all chars <> a..z in sql-server?](http://stackoverflow.com/questions/4024072/how-to-remove-accents-and-all-chars-a-z-in-sql-server) — Rhys Jones, Jan 28 '15 at 16:03
“Closest approximation” is both culture-dependent and subjective (e.g. mapping “ä” to “a” or “ae” or maybe something else). Besides, asking for a routine is off-topic at SO. — Jukka K. Korpela, Jan 28 '15 at 16:28
@JukkaK.Korpela Not they are not. They are extended ASCII. He is clearly talking about ACSII with the example 'ä' -> 'a'. — paparazzo, Jan 28 '15 at 18:15
1) The question was not to remove but to remap to a "best fit" value - removal is easy. 2) I did not ask someone to write a procedure. I asked if there was a standard procedure (or perhaps any standard at all). — Andrew, Jan 03 '19 at 14:42

score 0 · Accepted Answer · answered Jan 28 '15 at 16:03

Just loop through the string. For each character do a switch:

switch(inputCharacter)
{
    case 'ä':
      outputString = "ae";
      break;
    case 'ö':
      outputString = "oe";
      break;
...

(These translations are common in german language with ASCII only)

Then combine all outputStrings with a StringBuilder.

score 0 · Answer 2 · answered Jan 28 '15 at 18:03

0

I think you really mean extended ASCII to ASCII
Just a simple dictionary

Dictionary<char, char> trans = new Dictionary<char, char>() {...}  
StringBuilder sb = new StringBuilder();
foreach (char c in string.ToCharArray)
{
     if((Int)c <= 127) 
         sb.Append(c);
     else
         sbAppend(trans[c]);
}
string ascii = sb.ToString();

answered Jan 28 '15 at 18:03

paparazzo

44,497
23
105
176

I'd never heard the term "extended ASCII" as a specific term before; I always thought it was simply part of the phrase, "The extended ASCII table". I know that's playing big-time semantics, but... eh! And results at the [Google link you posted](https://www.google.com/#q=ascii+extended) seem to bear out it's use as a self-contained term. – Andrew Barber Jan 29 '15 at 18:22
But historically that is exactly how the game was played. Pure ascii back in the expensive memory and bit days was 2^7 and players extended it to to 2^8 (an even number of bytes). Windows 1252 code page was one of the first. – paparazzo Jan 29 '15 at 19:44
Well, yeah; I absolutely know what you mean by it... I just don't know why I'd never thought of it as that term. But... seems silly now! – Andrew Barber Jan 29 '15 at 19:49

Unicode to ASCII with character translations for umlats

2 Answers2