0

I have a client that sends unicode input files and demands only ASCII encoded files in return - why is unimportant.

Does anyone know of a routine to translate unicode string to a closest approximation of an ASCII string? I'm looking to replace common unicode characters like 'ä' to a best ASCII representation.

For example: 'ä' -> 'a'

Data resides in SQL Server however I can also work in C# as a downstream mechanism or as a CLR procedure.

Andrew
  • 307
  • 3
  • 10
  • possible duplicate of [How to remove accents and all chars <> a..z in sql-server?](http://stackoverflow.com/questions/4024072/how-to-remove-accents-and-all-chars-a-z-in-sql-server) – Rhys Jones Jan 28 '15 at 16:03
  • “Closest approximation” is both culture-dependent and subjective (e.g. mapping “ä” to “a” or “ae” or maybe something else). Besides, asking for a routine is off-topic at SO. – Jukka K. Korpela Jan 28 '15 at 16:28
  • @JukkaK.Korpela Neither ä nor Æ are ASCII – paparazzo Jan 28 '15 at 17:39
  • @Blam, so what? Both “a” and “ae” are ASCII. – Jukka K. Korpela Jan 28 '15 at 18:10
  • @JukkaK.Korpela Not they are not. They are extended ASCII. He is clearly talking about ACSII with the example 'ä' -> 'a'. – paparazzo Jan 28 '15 at 18:15
  • 1) The question was not to remove but to remap to a "best fit" value - removal is easy. 2) I did not ask someone to write a procedure. I asked if there was a standard procedure (or perhaps any standard at all). – Andrew Jan 03 '19 at 14:42

2 Answers2

0

Just loop through the string. For each character do a switch:

switch(inputCharacter)
{
    case 'ä':
      outputString = "ae";
      break;
    case 'ö':
      outputString = "oe";
      break;
...

(These translations are common in german language with ASCII only)

Then combine all outputStrings with a StringBuilder.

DrKoch
  • 9,556
  • 2
  • 34
  • 43
0

I think you really mean extended ASCII to ASCII
Just a simple dictionary

Dictionary<char, char> trans = new Dictionary<char, char>() {...}  
StringBuilder sb = new StringBuilder();
foreach (char c in string.ToCharArray)
{
     if((Int)c <= 127) 
         sb.Append(c);
     else
         sbAppend(trans[c]);
}
string ascii = sb.ToString();
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • I'd never heard the term "extended ASCII" as a specific term before; I always thought it was simply part of the phrase, "The extended ASCII table". I know that's playing big-time semantics, but... eh! And results at the [Google link you posted](https://www.google.com/#q=ascii+extended) seem to bear out it's use as a self-contained term. – Andrew Barber Jan 29 '15 at 18:22
  • But historically that is exactly how the game was played. Pure ascii back in the expensive memory and bit days was 2^7 and players extended it to to 2^8 (an even number of bytes). Windows 1252 code page was one of the first. – paparazzo Jan 29 '15 at 19:44
  • Well, yeah; I absolutely know what you mean by it... I just don't know why I'd never thought of it as that term. But... seems silly now! – Andrew Barber Jan 29 '15 at 19:49