0

I am new to Python Pandas and working on a small application where in i want to read my excel file having data in Hindi Language.

Issue I am facing is , pandas is not able to read hindi words and is placing some arbitary '?' symbol.

I have tried adding encoding to utf-8 but that is also not working.

My Excel Data :

enter image description here

Python Code :

df = pd.read_csv("Vegaretable_List.csv", encoding='utf-8')

Output :

['?? ' '??? ' '???? ' '????? ' '????']

Any help will be appreciable. Thanks in advance.

Avinash
  • 313
  • 1
  • 5
  • 14

3 Answers3

2

The problem shouldn't occur if the file is read in using the same encoding it was created with.

If you get "???", it means the csv or excel file was saved with a different encoding.

Here is a table of the standard encodings.

Also, you could open your file in an appropriate program, and save it with UTF-8, in order to read with your code.

Also See:

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Alfredo Maussa
  • 515
  • 2
  • 8
0

Do not create csv file, instead use excel file in .xlsx format. Python will read the hindi text. I did this and it worked.

dataset = pd.read_excel("Data.xlsx") 

Here the Data.xlsx contains all the hindi text that you gave.

Best of luck

Flair
  • 2,609
  • 1
  • 29
  • 41
-1

Assuming that your Excel/CSV file has a content similar to this:

मिशल
बहादुर
मेरी
जेन
जॉन
स्मिथ

The encoding type is correct. It's just that you have to iterate through the data to get it back.

For .CSV

import csv

with open('customers.csv', 'r', encoding='utf-8') as file:
    data = csv.reader(file)
    for row in data:
        print(row)

For .XLSX

with open('customers.xlsx', 'r', encoding='utf-8') as file:
    data = file.readlines()
    for row in data:
        print(row.strip())
Klein -_0
  • 158
  • 1
  • 13