Charset conversion from XXX to utf-8, command line

Question

I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8?

Possible duplicate of [Best way to convert text files between character sets?](https://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets) — MultiplyByZer0, Mar 23 '19 at 20:03

score 29 · Accepted Answer · edited Oct 28 '18 at 01:30

Use iconv, for example like this:

iconv -f LATIN1 -t UTF-8 input.txt > output.txt

Some more information:

You may want to specify UTF-8//TRANSLIT instead of plain UTF-8. To quote the manpage:

If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.
For a full list of encoding codes accepted by iconv, execute iconv -l.
The example above makes use of shell redirection. Make sure you are not using a shell that mangles encodings on redirection – that is, do not use PowerShell for this.

score 10 · Answer 2 · answered Apr 27 '10 at 15:32

10

recode latin2..utf8 myfile.txt

This will overwrite myfile.txt with the new version. You can also use recode without a filename as a pipe.

answered Apr 27 '10 at 15:32

legoscia

39,593
22
116
167

1

Way more efficient than accepted answer, because iconv won't replace the same file, even using -o or output redirects. – Julien Nov 15 '10 at 11:42

score 3 · Answer 3 · answered Apr 27 '10 at 15:23

3

GNU 'libiconv' should be able to do the job.

answered Apr 27 '10 at 15:23

Jonathan Leffler

730,956
141
904
1,278

Thanks! I knew it'd be easier than I thought! – Marcin Apr 27 '10 at 15:25

Charset conversion from XXX to utf-8, command line

3 Answers3