2

I am trying to open Excel file in python. (Excel itself can open it without any problem).

But trying to read it in python I got error: "ValueError: invalid literal for int() with base 10: '' "

How to cure it ? Or is there any way round ? May be some other package to read Excel files ?

It happens with both xlrd.open_workbook and pd.read_excel on my comp.


The strange thing is: if I open it with Excel and then save it by Excel, then such a saved version - can be opened by python without problems. So it might be that files were saved by some old-bad Excel. But I have many of them so handy open-save would be painful.


Here is detailed error message

----> 4 xlrd.open_workbook(dirName + fn )

C:\Anaconda3\lib\site-packages\xlrd\__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    420                 formatting_info=formatting_info,
    421                 on_demand=on_demand,
--> 422                 ragged_rows=ragged_rows,
    423                 )
    424             return bk

C:\Anaconda3\lib\site-packages\xlrd\xlsx.py in open_workbook_2007_xml(zf, component_names, logfile, verbosity, use_mmap, formatting_info, on_demand, ragged_rows)
    831         x12sheet = X12Sheet(sheet, logfile, verbosity)
    832         heading = "Sheet %r (sheetx=%d) from %r" % (sheet.name, sheetx, fname)
--> 833         x12sheet.process_stream(zflo, heading)
    834         del zflo
    835 

C:\Anaconda3\lib\site-packages\xlrd\xlsx.py in own_process_stream(self, stream, heading)
    546         for event, elem in ET.iterparse(stream):
    547             if elem.tag == row_tag:
--> 548                 self_do_row(elem)
    549                 elem.clear() # destroy all child elements (cells)
    550             elif elem.tag == U_SSML12 + "dimension":

C:\Anaconda3\lib\site-packages\xlrd\xlsx.py in do_row(self, row_elem)
    663                 if explicit_row_number and cell_name[charx:] != row_number:
    664                     raise Exception('cell name %r but row number is %r' % (cell_name, row_number))
--> 665             xf_index = int(cell_elem.get('s', '0'))
    666             cell_type = cell_elem.get('t', 'n')
    667             tvalue = None

ValueError: invalid literal for int() with base 10: ''
Alexander Chervov
  • 616
  • 3
  • 7
  • 23

2 Answers2

0

Try

xlrd.open_workbook(dirName +str(fn)+".xlsx")

Or

xlrd.open_workbook(dirName +str(fn)+".xls")

ltd9938
  • 1,444
  • 1
  • 15
  • 29
Avinash
  • 41
  • 3
  • it does not help. If I rename my file to ".xls" it gives same error – Alexander Chervov Nov 13 '18 at 14:00
  • then the column in the file contains blank values along with the integer to debug more details are required. Please refer https://stackoverflow.com/questions/1841565/valueerror-invalid-literal-for-int-with-base-10 – Avinash Nov 13 '18 at 14:20
  • But it typical for Excel to contain such thing and python typically can read it. Moreover open by Excel + save -> then python load works - but it does not change such thing – Alexander Chervov Nov 13 '18 at 14:31
  • I am not sure the linked question gives a clue, at least not tot me – Alexander Chervov Nov 13 '18 at 14:32
0

I'm having the same problem and I found this: https://github.com/python-excel/xlrd/pull/346/files

I don't know why it wasn't commited and why you shouldn't do this... BUT the answer on the forums as this change were refused were "use openpyxl" so, I just made this change because no, I don't want to and I don't have time and this seems a good solution to me. Maybe it's helpfull to someone else and maybe someone can clarify what's up with this.

PS.: the error tells you where you should change this: 'C:\Anaconda3\lib\site-packages\xlrd\xlsx.py' in the question's case

Kostynha
  • 145
  • 2
  • 10