1

I have a "2016 excel file" Sb_test.xlsx which I want to convert to .csv file. However,

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<## NASC'

occured at line wb = xlrd.open_workbook(r"D:/Sb_test.xlsx") of the below code:

import tensorflow as tf
import pandas as pd
import os, xlrd, csv

def csv_from_excel():
    print (xlrd.__VERSION__, xlrd.__file__) # suggested at google forum
    wb = xlrd.open_workbook(r"D:/Sb_test.xlsx")
    print (xlrd.__VERSION__, xlrd.__file__)
    sh = wb.sheet_by_name('Basic_Classification')
    your_csv_file = open('Sb_01_csv.csv', 'w')
    wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)

    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    your_csv_file.close()

csv_from_excel()

Looking for a solution I found out that I might be using the older version of xlrd, but no, it is 1.2.0 (the most recent one)

And here an accepted answer suggests opening it using a text editor, which in my case looks like this:

enter image description here

Once I realized that

... that is definitely not Excel .xls format

what am I supposed to do know to convert the (whatever type) file to csv format?

All I want is to have a CSV type file to further try some machine learning stuff.

Thanks for your help.

bit_scientist
  • 1,496
  • 2
  • 15
  • 34
  • Is the file encrypted? – user2263572 Jan 29 '20 at 01:41
  • @user2263572, since its initial bytes start as `NASCA ..` , yes. And is it because jupyter notebook uses a server and I have an encrypted file which is not allowed to be accessed ? – bit_scientist Jan 29 '20 at 01:48
  • 1
    Can you just open with excel (enter the password) and save as filetype -> .csv? Then use the python csv module to read in the file? – user2263572 Jan 29 '20 at 01:53
  • @user2263572 I did as you said, after some `UnicodeDecodeError` and `ParserError` errors occured, I end up using `encoding = "latin-1", sep='\t', header=None`. But when I try to print the `pd.head()` its output isn't readable. Any more ideas? – bit_scientist Jan 29 '20 at 04:07
  • Did you save as csv or tsv in excel? Also, post the code along with error messages. You should also be able to open the csv file in a text editor to see if it looks correct. – user2263572 Jan 29 '20 at 14:24
  • @user2263572, yes I saved it as csv file, I solved all the above mentioned errors (UnicodeDecodeError and ParserError), I had tried opening with notepad++ and notepad, the former outputting unreadable content (`<## NASCA DRM`) and the latter outputting the content as it is. – bit_scientist Jan 30 '20 at 01:28

0 Answers0