2

Im trying to find if a .xlsx file contains a @. I have used pandas, which work great, unless if the excel sheet have the first column empty, then it fails.. any ideas how to rewrite the code to handle/skip empty columns?

the code:

df = pandas.read_excel(open(path,'rb'), sheetname=0)
out = 'False'
for col in df.columns:
    if df[col].str.contains('@').any():
        out = 'True'
        break

This is the error i'm getting:

    df = pandas.read_excel(open(path,'rb'), sheetname=0)
  File "/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 203, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 258, in __init__
    self.book = xlrd.open_workbook(file_contents=data)
  File "/anaconda3/lib/python3.6/site-packages/xlrd/__init__.py", line 162, in open_workbook
    ragged_rows=ragged_rows,
  File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 91, in open_workbook_xls
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 1271, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 1265, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x17Microso'
Bidstrup
  • 1,597
  • 2
  • 16
  • 32
  • 1
    I guess that the Excel you are working with is already opened, before letting the Python script run? Check this - https://stackoverflow.com/questions/16504975/error-unsupported-format-or-corrupt-file-expected-bof-record – Vityata Jan 10 '18 at 14:39
  • 1
    FYI - In general, whenever you are getting an error like this, the lowest line is the most important one, which is actually generating the error, the rest is just the stack of the error-handling – Vityata Jan 10 '18 at 14:47
  • 1
    Thanks, that was excellent information :-) – Bidstrup Jan 10 '18 at 14:57
  • Cant figure out the how to make the exception, if a user have the excel open.. it should just skip that file, or make out = 'Maybe' – Bidstrup Jan 11 '18 at 08:22
  • See the first two answers here and build a similar function with Python - https://stackoverflow.com/questions/9373082/detect-whether-excel-workbook-is-already-open – Vityata Jan 11 '18 at 10:37
  • Im on a mac, and when opening a file it automaticly makes a temp file called ~$ in the begining.. its this file sit tries to read but fails in :) – Bidstrup Jan 11 '18 at 21:11

2 Answers2

1

If you want to check whether at least one cell equals to specific character/string:

def excel_has_str(filename, search='@'):
    return pd.read_excel(filename).astype(str).eq(search).any().any()

If you want to check whether at least one cell contains specific character/string:

def excel_contains_str(filename, search='@'):
    return pd.read_excel(filename) \
             .astype(str) \
             .apply(lambda x: x.str.contains(search)) \
             .any() \
             .any()

It'll automatically take care of empty strings and empty columns...

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

This might help. As the link says, probably is an HTML file with xlsx extension or it is open already by Excel. Also you could try to load it this way and see what happens:

pd.read_excel(path_of_file, sheetname=0)
Ivan Gonzalez
  • 446
  • 1
  • 5
  • 14