I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8?
Asked
Active
Viewed 2.0k times
14
-
4Most likely ISO-885**9**-2. ISO 8851 speaks about butter. – Melebius Feb 08 '16 at 10:37
-
Possible duplicate of [Best way to convert text files between character sets?](https://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets) – MultiplyByZer0 Mar 23 '19 at 20:03
3 Answers
29
Use iconv
, for example like this:
iconv -f LATIN1 -t UTF-8 input.txt > output.txt
Some more information:
You may want to specify
UTF-8//TRANSLIT
instead of plainUTF-8
. To quote the manpage:If the string
//TRANSLIT
is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.For a full list of encoding codes accepted by
iconv
, executeiconv -l
.- The example above makes use of shell redirection. Make sure you are not using a shell that mangles encodings on redirection – that is, do not use PowerShell for this.

MultiplyByZer0
- 6,302
- 3
- 32
- 48

lhf
- 70,581
- 9
- 108
- 149
10
recode latin2..utf8 myfile.txt
This will overwrite myfile.txt
with the new version. You can also use recode without a filename as a pipe.

legoscia
- 39,593
- 22
- 116
- 167
-
1Way more efficient than accepted answer, because iconv won't replace the same file, even using -o or output redirects. – Julien Nov 15 '10 at 11:42