1

I have been trying to open an excel file (xlsx format and csv format) using python pandas and I am facing utf-8 encoding errors. I have also tried the encoding codes but could not solve the issue.

Kindly support me to understand and solve the issue

this is the code :

import pandas as pd
excel_file = 'Task1/Data_task1.xlsx'
data =  pd.read_excel(excel_file, encoding='utf-8', errors = 'ignore')
print(data)

Error :

File "c:\Users\nivas\Desktop\Srinivas\Internship\Dealroomo\Task1\task1.py", line 4, in <module>
    print(data)
  File "C:\Users\nivas\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 3140-3145: character maps to <undefined>
normanius
  • 8,629
  • 7
  • 53
  • 83
  • Does this answer your question? [UnicodeEncodeError: 'charmap' codec can't encode characters](https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters) – Dishin H Goyani Dec 30 '19 at 12:27
  • No @DishinHGoyani I tried it, but it gives the same error – srinivas muralidharan Dec 30 '19 at 12:29
  • 1
    You need to pass the correct value for `encoding`. Since it is an Excel file, maybe `encoding='iso8859-1'` can help. – accdias Dec 30 '19 at 12:32
  • Still continue to be the same @accdias – srinivas muralidharan Dec 30 '19 at 12:33
  • 1
    `cp1252` and `cp1251` are common as well. You need to figure out what encoding is used on your Excel file. Take a look [here](https://stackoverflow.com/questions/8509339/what-is-the-most-common-encoding-of-each-language). – accdias Dec 30 '19 at 12:36
  • 1
    error shows that problem has `print()`, not `read_excel()` so problem is Windows terminal/console/cmd.exe which uses `cp1250` as default encoding - so `print()` tries to convert displayed data to `cp1250`. Some people change default encoding in Windows registers. Search encoding `windows register encoding 65000` – furas Dec 30 '19 at 12:51
  • 1
    [Change default code page of Windows console to UTF-8](https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8) – furas Dec 30 '19 at 12:54
  • Possible duplicate of https://stackoverflow.com/questions/10611455/what-is-character-encoding-and-why-should-i-bother-with-it – tripleee Dec 30 '19 at 13:21

1 Answers1

0

From my experience, Excel Text and Python do not play well together, and the many times the encoding just never works; do not know why or how.

2 Possible solutions:

  1. Convert the file to CSV (.txt/.csv format) and see if you can encode it manually inside Excel.

  2. Run the program on Linux Ubuntu using LibreOffice instead of Excel. Again, you will need to convert to .csv. However, LibreOffice seems to handle the encoding MUCH better than Excel. For whatever reason, Excel can refuse to convert and get rid of all the funky characters that raise Unicode Error in Python.

Best of luck

Sam Dean
  • 379
  • 9
  • 19