0

At first trying to read excel/csv data to check it before inserting into database, While csv working fine can read data, but xlsx and xlx showing following error

UnicodeDecodeError at /academy/add_advisor 'utf-8' codec can't decode byte 0xa1 in position 10: invalid start byte

Snippet of my code:

from tablib import Dataset
this_file = request.FILES['bulk_file']

dataset = Dataset()
imported_data = dataset.load(this_file.read().decode("utf-8"),format='xlsx')

for data in dataset:
    print(data[0], data[1], data[2], data[3], data[4])

According to error message error coming from this line

imported_data = dataset.load(this_file.read().decode("utf-8"),format='xlsx')

The excel file i am trying to import, i downloaded it from google drive excel as xlsx file(microsoft excel). Also downloaded one from onedrive(microsoft) xlsx file still getting same error.

Few more ways i have tried with are

imported_data = dataset.load(this_file.read().decode("ISO-8859-1"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("ISO-8859-1"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("CP1252"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("windows-1252"),format='xlsx')
imported_data = dataset.load(this_file.read().strip().decode("Latin-1"),format='xlsx')

But no luck :( Please share if there has any better way i can try :)(: Thank you for read :)

sheikhsalman08
  • 387
  • 1
  • 9
  • 18
  • It looks to me that it looks for a stream, not for a string, what if you only usee `load(the_file, format='xlsx')`? – Willem Van Onsem Aug 01 '18 at 09:16
  • Hi, tnx for you reply, Actually if you check load is a method of dataset, using that to load the data from excel to dataset object . After that will loop with dataset object(variable) to get each line of data – sheikhsalman08 Aug 01 '18 at 09:23
  • yes but the documentation (http://docs.python-tablib.org/en/master/api/#tablib.Dataset.load) specifies an `in_stream`. A file handler is a stream: you can use it for example to binary read it. So you only need to pass the file handler I think. Or perhaps a binary string, but decoding makes no sense, since an `.xlsx` is actually a zipped collection of XML files. – Willem Van Onsem Aug 01 '18 at 09:25
  • looks like after removing .decode("utf-8") it works. – sheikhsalman08 Aug 01 '18 at 09:32
  • Yes it is working, – sheikhsalman08 Aug 01 '18 at 09:33
  • please give an answer below i want to accept this answer. Was working from last 7+ hrs on it. Thanks again – sheikhsalman08 Aug 01 '18 at 09:35

1 Answers1

1

Well if your the_file is an .xlsx file, then it is probably no good idea to decode the stream manually: an .xlsx is a zipped stream of XML files. So that means it does not really follow a standard string encoding.

But the dataset.load(..) also suggests it does not per see needs a string, it needs data that it aims to load. In case of an .xlsx file, that is binary data, so we can use .read() to read the content of the file into a binary string, and thus let the dataset do the correct processing, like:

imported_data = dataset.load(this_file.read(),format='xlsx')
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555