using tablib and django-import-export to import data, getting error with excel, but with csv it's working

Question

At first trying to read excel/csv data to check it before inserting into database, While csv working fine can read data, but xlsx and xlx showing following error

UnicodeDecodeError at /academy/add_advisor 'utf-8' codec can't decode byte 0xa1 in position 10: invalid start byte

Snippet of my code:

from tablib import Dataset
this_file = request.FILES['bulk_file']

dataset = Dataset()
imported_data = dataset.load(this_file.read().decode("utf-8"),format='xlsx')

for data in dataset:
    print(data[0], data[1], data[2], data[3], data[4])

According to error message error coming from this line

imported_data = dataset.load(this_file.read().decode("utf-8"),format='xlsx')

The excel file i am trying to import, i downloaded it from google drive excel as xlsx file(microsoft excel). Also downloaded one from onedrive(microsoft) xlsx file still getting same error.

Few more ways i have tried with are

imported_data = dataset.load(this_file.read().decode("ISO-8859-1"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("ISO-8859-1"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("CP1252"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("windows-1252"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("Latin-1"),format='xlsx')

But no luck :( Please share if there has any better way i can try :)(: Thank you for read :)

It looks to me that it looks for a stream, not for a string, what if you only usee `load(the_file, format='xlsx')`? — Willem Van Onsem, Aug 01 '18 at 09:16
Hi, tnx for you reply, Actually if you check load is a method of dataset, using that to load the data from excel to dataset object . After that will loop with dataset object(variable) to get each line of data — sheikhsalman08, Aug 01 '18 at 09:23
yes but the documentation (http://docs.python-tablib.org/en/master/api/#tablib.Dataset.load) specifies an `in_stream`. A file handler is a stream: you can use it for example to binary read it. So you only need to pass the file handler I think. Or perhaps a binary string, but decoding makes no sense, since an `.xlsx` is actually a zipped collection of XML files. — Willem Van Onsem, Aug 01 '18 at 09:25
please give an answer below i want to accept this answer. Was working from last 7+ hrs on it. Thanks again — sheikhsalman08, Aug 01 '18 at 09:35

score 1 · Accepted Answer · answered Aug 01 '18 at 10:48

Well if your the_file is an .xlsx file, then it is probably no good idea to decode the stream manually: an .xlsx is a zipped stream of XML files. So that means it does not really follow a standard string encoding.

But the dataset.load(..) also suggests it does not per see needs a string, it needs data that it aims to load. In case of an .xlsx file, that is binary data, so we can use .read() to read the content of the file into a binary string, and thus let the dataset do the correct processing, like:

imported_data = dataset.load(this_file.read(),format='xlsx')

using tablib and django-import-export to import data, getting error with excel, but with csv it's working

1 Answers1