0

I have a file which contains the letter ö. Except that it doesn't. When I open the file in gedit, I see:

\u00f6

I tried to convert the file, applying code that I found on other threads:

$ file blb.txt 
blb.txt: ASCII text
$ iconv -f ISO-8859-15 -t UTF-8 blb.txt > blb_tmp.txt
$ file blb_tmp.txt 
blb_tmp.txt: ASCII text

What am I missing?

EDIT

I found this solution:

echo -e "$(cat blb.txt)" > blb_tmp.txt
$ file blb_tmp.txt
blb_tmp.txt: UTF-8 Unicode text

The -e "enables interpretation of backslash escapes".

Still not sure why iconv didn't make it happen. I'm guessing it's something like "iconv only changes the encoding, it doesn't interpret". Not sure yet, what the difference is though. Why did the Unicode people make this world such a mess? :D

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
user3182532
  • 1,097
  • 5
  • 22
  • 37
  • You might want to learn a little bit more about [UTF-8](https://en.wikipedia.org/wiki/UTF-8), because it basically is an extended ASCII format. – Some programmer dude Sep 07 '17 at 12:37
  • Also, can you please explain how this question isn't off-topic according to ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask) How is this related to programming? – Some programmer dude Sep 07 '17 at 12:38
  • 3
    Are you sure it contains the letter *ö* and not those six ASCII letters? – pacholik Sep 07 '17 at 12:45
  • pacholik you're probably right. but I thought iconv would convert that? – user3182532 Sep 07 '17 at 12:49
  • Nope. `\u00f6` is not *ö* in *ISO-8859-15*. It is unicode escape sequence in some programming languages. – pacholik Sep 07 '17 at 12:53
  • 2
    https://stackoverflow.com/q/8795702/1028589 – pacholik Sep 07 '17 at 12:56
  • https://stackoverflow.com/questions/14820429/how-do-i-decodestring-escape-in-python3 – Josh Lee Sep 08 '17 at 21:37
  • Alas, `echo` changes \n to space every time. – Polluks Jul 06 '21 at 08:36
  • "I'm guessing it's something like "iconv only changes the encoding, it doesn't interpret"." Yes, that's exactly it. "Not sure yet, what the difference is though." The issue is that the file contents *actually represent* a backslash, lowercase u etc., rather than simply using different bytes to represent the o-with-diaresis than `gedit` expects. "Why did the Unicode people make this world such a mess? :D" Please read https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/. – Karl Knechtel Aug 05 '22 at 02:21

0 Answers0