56

I have an Arabic file encoded in ISO8859-15. How can I convert it into UTF8?
I used iconv but it doesn't work for me.

iconv -f ISO-8859-15 -t UTF-8 Myfile.txt

I wanted to attach the file, but I don't know how.

StackzOfZtuff
  • 2,534
  • 1
  • 28
  • 25
Hakim
  • 11,110
  • 14
  • 34
  • 37
  • 6
    Does `iconv` print an error message, or does it convert incorrectly? (Incidentally, you might *accept* more of the answers you have received to earlier questions. The answerers would appreciate this.) – thb Jul 03 '12 at 18:33
  • No it doesn't print an error. I mean it converts the file incorrectly. I checked the encoding of the file, and found it ISO-8859-15. – Hakim Jul 03 '12 at 18:36
  • 1
    how did u determine it to be ISO-8895-15? – pizza Jul 03 '12 at 18:40
  • I opened the file, tried to Save As it. In the window appeared, the encoding of the file was ISO-8859-15. Is there another way to determine the encoding of the file? – Hakim Jul 03 '12 at 18:43
  • ISO-8895-15 is a single byte character set, unless the msb is on for the byte, it looks exactly like the UTF-8 version. – pizza Jul 03 '12 at 18:45
  • So How I can change encoding of the file to see its characters correctly. I mean, when I open the file, its characters aren't shown correctly, and I couldn't read the file... – Hakim Jul 03 '12 at 18:51
  • 1
    Plain text files don't include any information about the encoding used; it's just a sequence of bytes. The program that opens the file is responsible for inferring the encoding. That said, perhaps there is a special string you can include in the file that your program can use as a hint. – chepner Jul 03 '12 at 19:29
  • whatever tool you use to view the file mostly likely sensitive to your locale, you should make sure your locale is the same as what you think the file is before attempting to view it. – pizza Jul 03 '12 at 19:47
  • 8
    ISO 8859-15 cannot represent Arabic text. Perhaps you mean 8859-6 or some legacy encoding? See, for a start, http://en.wikipedia.org/wiki/ISO/IEC_8859-6 – tripleee Aug 31 '12 at 12:14

8 Answers8

54

Could it be that your file is not ISO-8859-15 encoded? You should be able to check with the file command:

file YourFile.txt

Also, you can use iconv without providing the encoding of the original file:

iconv -t UTF-8 YourFile.txt
HighKing
  • 1,064
  • 1
  • 10
  • 11
  • 1
    How would the file command be able to tell you which encoding is appropriate to understand the file's content? – Thorsten Staerk Aug 18 '15 at 21:11
  • 7
    @ThorstenStaerk I don't think it does. The man page says this: "If no from-encoding is given, the default is derived from the current locale's character encoding." So I believe HighKing's comment about not providing the encoding of the original file is wrong. – Stéphane Jun 11 '16 at 00:53
  • The file utility do not always guess the correct encoding. You need to manually to judge the content if it is understandable by opening the file with different encoding. – code4j May 31 '17 at 18:40
  • its worth a try without specifying the source encoding - worked for me – Nicolas Jun 02 '19 at 15:37
36

I found this to work for me:

iconv -f ISO-8859-14 Agreement.txt -t UTF-8 -o agreement.txt
Colin Keenan
  • 1,089
  • 12
  • 20
  • 1
    while doing ``file myfile.txt``, it gives ``ISO-8859``. So, i have tried with yours (except ``-14``). It shows ``ISO-8859 is not supported``. And finally just I have added ``-14`` along with ``ISO-8859-14`` and worked.. – Spike Oct 05 '16 at 10:01
  • 3
    I have seen usually ISO-8859-1 – Sergio Abreu Feb 22 '17 at 03:03
13

I have ubuntu 14 and the other answers where no working for me

iconv -f ISO-8859-1 -t UTF-8 in.tex -o out.tex

I found this command here

simon
  • 12,666
  • 26
  • 78
  • 113
aburbanol
  • 437
  • 8
  • 27
  • That's the same as OP aside from you seem to be starting with a more common character set. Annoyingly Google seems to have ignored the 5 at the end of ISO-8859-15 in the title when searching for ISO-8859-1 – mjaggard Dec 02 '21 at 08:44
9

We have this problem and to solve

Create a script file called to-utf8.sh

#!/bin/bash
TO="UTF-8"; FILE=$1
FROM=$(file -i $FILE | cut -d'=' -f2)
if [[ $FROM = "binary" ]]; then
 echo "Skipping binary $FILE..."
 exit 0
fi
iconv -f $FROM -t $TO -o $FILE.tmp $FILE; ERROR=$?
if [[ $ERROR -eq 0 ]]; then
  echo "Converting $FILE..."
  mv -f $FILE.tmp $FILE
else
  echo "Error on $FILE"
fi

Set the executable bit

chmod +x to-utf8.sh

Do a conversion

./to-utf8.sh MyFile.txt

If you want to convert all files under a folder, do

find /your/folder/here | xargs -n 1 ./to-utf8.sh

Hope it's help.

Charles Santos
  • 739
  • 9
  • 12
  • Really useful to convert a bunch of files to UTF8 in place.. since iconv will not work if the input and output file "are the same". Thankyou! – Gionata May 04 '21 at 09:19
  • Using this script on a java project I get that ".class" are not supported.. and they are not binary from the file command. Anyway no need to convert those file... – Gionata May 04 '21 at 09:25
5

I got the same problem, but i find the answer in this page! it works for me, you can try it.

iconv -f cp936 -t utf-8 
nanoguo
  • 51
  • 2
  • 3
3

in my case, the file command tells a wrong encoding, so i tried converting with all the possible encodings, and found out the right one.

execute this script and check the result file.

for i in `iconv -l`
do
   echo $i
   iconv -f $i -t UTF-8 yourfile | grep "hint to tell converted success or not"
done &>/tmp/converted
wintermeyer
  • 8,178
  • 8
  • 39
  • 85
Li.Gui
  • 31
  • 2
2

You can use ISO-8859-9 encoding:

iconv -f ISO-8859-9 Agreement.txt -t UTF-8 -o agreement.txt
Nuri Akman
  • 792
  • 3
  • 18
  • 41
0

Iconv just writes the converted text to stdout. You have to use -o OUTPUTFILE.txt as an parameter or write stdout to a file. (iconv -f x -t z filename.txt > OUTPUTFILE.txt or iconv -f x -t z < filename.txt > OUTPUTFILE.txt in some iconv versions)

Synopsis

iconv -f encoding -t encoding inputfile

Description

The iconv program converts the encoding of characters in inputfile from one coded character set to another. 
**The result is written to standard output unless otherwise specified by the --output option.**

--from-code, -f encoding

Convert characters from encoding

--to-code, -t encoding

Convert characters to encoding

--list

List known coded character sets

--output, -o file

Specify output file (instead of stdout)

--verbose

Print progress information.
Piotr Siupa
  • 3,929
  • 2
  • 29
  • 65