1

I have a workbook that I am modifying with openpyxl.

After I run the code, I can see the formula being written in the excel file but it prints an empty df. (if I open the excel file and manually click save, the printed df has the right data).

Can anyone explain how to allow the read_excel to read the value in that cell instead of the formula?

from openpyxl import load_workbook
import pandas as pd


file_path = "Test.xlsx"
workbook = load_workbook(file_path)
workbook["Temp"]["A1"] = f"=SUM(2+2)"
workbook.save(file_path)
df = pd.read_excel(file_path, header=None)
print(df)  # empty df??
Ell
  • 51
  • 6
  • I'm not able to reproduce the problem, but it's generally a good idea to close the file with `workbook.close()` before reading it through a different mechanism. – sj95126 Jul 29 '22 at 00:36
  • You'll need some kind of application like Excel to evaluate the formula. – Charlie Clark Jul 29 '22 at 09:16

1 Answers1

0

I locate the problem here by reading the source code

pandas.io.excel._openpyxl.OpenpyxlReader._convert_cell


    def _convert_cell(self, cell, convert_float: bool) -> Scalar:
        ......
        # Here, cell.value leads to "" and I guess pandas has no parsing formula
        if cell.value is None:
            return ""  # compat with xlrd
        .....
        return cell.value
Xin
  • 22
  • 4
  • so there is no way to fix it? – Ell Jul 29 '22 at 17:40
  • Yes, not for the time being, or you can give feedback to them – Xin Jul 30 '22 at 06:07
  • I managed to find a work around for it. Since I know the formulas will always be at the last 2 rows, I use openpyxl to read the values from the worksheet `workbook[worksheet_name].values` (which contains the formulas rather than the computed value), then I just convert it to a dataframe and drop the un-needed rows – Ell Jul 30 '22 at 14:01