0

I am trying to read txt file with special characters like: الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ

I'm using:

import fileinput 
fileToSearch = "test_encoding.txt"
with open(fileToSearch, 'r', encoding='utf-8') as file:
    counter = 0;
    for line in file:
        print(line)

But Python crashes with this message:

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    print(line)
  File "C:\Users\atheelm\AppData\Local\Programs\Python\Python35-
32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: 
character maps to <undefined>

I have Python 3.5.1 and I'm using Windows.

I'm running this command:

py test.py > out.txt
Pek
  • 166
  • 1
  • 15
Atheel Massalha
  • 424
  • 1
  • 6
  • 18
  • You need to change 'encoding' to something that includes those characters – bendl Dec 04 '17 at 14:48
  • Well your print is failing. You could fix that by adding # -*- coding: utf-8 -*- to the start of the script. UTF-8 should support farsi characters afaik. More details on this thread: https://stackoverflow.com/questions/39528462/python-3-print-function-with-farsi-arabic-characters – BoboDarph Dec 04 '17 at 14:55
  • see edit, Im printing the output to a file. still crashs – Atheel Massalha Dec 04 '17 at 15:08
  • What is the actual binary content of your file? Are you sure the file is encoded with utf-8? – Tom Dalton Dec 04 '17 at 15:34
  • @BoboDarph the source-code encoding declaration does **not** affect the encoding of the STDOUT stream (which `print` uses by default). Please don't further spread this misconception. Thanks! – lenz Dec 04 '17 at 18:35

1 Answers1

0

use 2 diff files and use io:

lines=["Init"]
with io.open(fileToSearch,'r',encoding='utf-8') as file:
    counter = 1;
    for line in file:
        lines.insert(counter,str(line))
        counter = counter+1
with io.open(out_file,'w',encoding='utf-8') as file:
    for item in lines:
        file.write("%s\n" % item)
Atheel Massalha
  • 424
  • 1
  • 6
  • 18
  • It's good you found a solution yourself. An unrelated hint: have a look at the built-in function `enumerate`, which frees you from taking care of incrementing `counter`: You simply write `for counter, line in enumerate(file):`. – lenz Dec 04 '17 at 18:32
  • And, btw, there's no need for the `io` module here; in Python 3, `io.open` is the same as built-in `open`. – lenz Dec 04 '17 at 18:37