0

I found out that when I save this distorted string ("Äußerungen üben") as an ANSI text file, then open it with Firefox and choose in the Firefox menu "Unicode", it turns it into a readable german format ("Äußerungen üben").

The same thing is possible with my text editor (Notepad++).

Is there any way to achieve this with JavaScript? E.g. the following would be nice:

var output = makeReadable("Äußerungen üben");

Unfortunately, I get this kind of distorted strings from an external source which doesn't care about UTF-8 and provides all data as ANSI.

PS: Saving the file as UTF-8 and setting the charset as UTF-8 in the META Tag has no effect.

Edit:

Now I solved it through making a list of all common UTF8/ANSI distortions (more than 1300) and wrote a function replacing all wrong character combinations with the right character. It works fine :-) .

xampper
  • 91
  • 7

1 Answers1

0

I think the encoding of the "distorted string" in your question got munged further by posting it here. But a quick Google search for "javascript convert from utf-8" returns this blog post as the top hit: http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html

So it turns out that encoding and decoding UTF-8 in JavaScript is really easy. This works great for me:

var original = "Äußerungen üben";
var utf8 = unescape(encodeURIComponent(original));
//return utf8; // something like "ÃuÃerungen üben"
var output = decodeURIComponent(escape(utf8));
return output;
Dan Korn
  • 1,274
  • 9
  • 14
  • I should also point out that a Google search for "javascript convert from utf-8 site:stackoverflow.com" returns a top hit of this Stack Overflow post with basically the same question and answer: http://stackoverflow.com/questions/13356493/decode-utf-8-with-javascript So the question could be marked as a duplicate. – Dan Korn May 28 '14 at 23:32
  • I already know that source, since I am doing research for this for hours. But it does not work with all strings. – xampper May 28 '14 at 23:33
  • The distorted string is still exactly the same even after posting it here, it didn't get munged further. With your method I get an error message: "URIError: malformed URI sequence" – xampper May 28 '14 at 23:34
  • All I can do is answer the question asked, with the sample data you provided. The code I posted works for that data. Can you provide an example of a string for which my solution doesn't work? – Dan Korn May 28 '14 at 23:35
  • Exactly the string above ("Äußerungen üben"), or this one: هذا نص عربي. They are being correctly converted via Firefox menu. – xampper May 28 '14 at 23:40
  • This web page here on Stack Overflow does not know the proper encoding of the "distorted string" that you copied-and-pasted. If I take the original German string and convert it to UTF-8, I get this: ÃuÃerungen üben Although copy-and-pasting that here probably isn't working right either. The upshot is that, if the function isn't working, then what you have is not valid UTF-8 to start with. – Dan Korn May 28 '14 at 23:40
  • If you can find any valid UCS-2 (UTF-16, less than 0xFFFF) string, put it in the first line of my example, and it doesn't convert correctly to UTF-8 and back, then I will agree that the code doesn't work. Otherwise, all evidence shows that it absolutely does work. – Dan Korn May 28 '14 at 23:42
  • Here is a screenshot, so you can compare wether the string has changed by posting it here: http://img5.fotos-hochladen.net/uploads/picx4qaum0hi8.png Please notice that your version of the string is different – xampper May 28 '14 at 23:45
  • I'm not sure what the screen shot proves. The only way to know for sure what's in the file is to look at it as binary. Otherwise, the text editor is doing some kind of conversion from the binary code points, to a character in some encoding, and then displaying glyphs in a font. Lots of unknowns there. – Dan Korn May 28 '14 at 23:50
  • There may also be differences in the JavaScript engines we're using, as well as in the way that the strings are being input to and output from JavaScript. – Dan Korn May 28 '14 at 23:54
  • This comment on the blog post may be helpful: http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html#comment-124886232 – Dan Korn May 28 '14 at 23:58
  • Then I must keep trying and doing further research. Looks like more hours... Thank you for trying to help me with this problem, Dan. – xampper May 29 '14 at 00:04