10

how do i get the values of a spreadsheet as they are formatted? im working on spreadsheets with a currency format

this for example:

ITEM NAME UNIT PRICE
item1     USD 99
item2     SGD 45

but the terms 'USD' and 'SGD' were added using the formatting capabilities of excel, and is not seen by the read_excel function of pandas. i would get the values, but not the currency name. i could only work on the spreadsheets as it is, and given that i have various spreadsheets with about 6-7 sheets each, i was hoping to have a pandas (or python)-level solution rather than an excel-level solution.

thanks guys.

to Daniel, this is how i implemented the 'xlrd' engine, which didn't seem to do anything.

excel = pd.ExcelFile('itemlist.xlsx', sheetname=None)
master = pd.DataFrame(None)

for sheet in excel.sheet_names:
    df = pd.read_excel(excel,sheet,header=2, engine='xlrd')
    master=master.append(df)
carlo
  • 193
  • 1
  • 2
  • 14
  • pandas uses internally `xlrd` which can parse formatting information, too. – Daniel Jun 26 '16 at 12:39
  • @Daniel edited my post to show how i implemented the xlrd engine, although i seem to be misssing something – carlo Jun 26 '16 at 13:39

3 Answers3

11

There's not any great way to do this. pandas has no knowledge of the number formats, and xlrd doesn't seem to be able to read formats from a .xlsx file - see here

You could use openpyxl to accomplish this, it at least has access to the number formats, but it looks like you'd have to basically implement all the parsing logic yourself.

In [26]: from openpyxl import load_workbook

In [27]: wb = load_workbook('temp.xlsx')

In [28]: ws = wb.worksheets[0]

In [29]: ws.cell("B2")  # numeric value = 4, formatted as "USD 4"
Out[29]: <Cell Sheet1.B2>

In [30]: ws.cell("B2").value
Out[30]: 4

In [31]: ws.cell("B2").number_format
Out[31]: '"USD "#'
Community
  • 1
  • 1
chrisb
  • 49,833
  • 8
  • 70
  • 70
1

to anyone having the same problem, i did it by separating each sheet into CSVs, using this answer

It preserves the currency details as displayed in the spreadsheet, and while a little cumbersome, it did the job anyway.

Community
  • 1
  • 1
carlo
  • 193
  • 1
  • 2
  • 14
  • 1
    This answer helped me a lot! I ended up using `unoconv` on the command line to batch convert all the excel files ( over 1k! ) into CSV files before processing them with `python`. – EA Bubnoff Oct 07 '22 at 21:30
1

Honestly speaking, xlrd allows you to access cell format info.

book = xlrd.open_workbook("tmp/test.xls", formatting_info=True)  
xf_index = book.sheet_by_index(0).cell_xf_index(5,7)  
xf = book.xf_list[xf_index]  
fmt = book.format_map[xf.format_key]  
print(fmt.format_str, fmt.type)  
Roman Suprotkin
  • 310
  • 3
  • 5
  • This works for '.xls' file formats only, and not for '.xlsx' file formats – Malgo Nov 26 '21 at 04:15
  • @Malgo you are right. There is no library that works with both formats. You need to implement it separately for every format. I just showed how it could be implemented for the '.xls' format. – Roman Suprotkin Nov 30 '21 at 06:43