How to force pandas to evaluate formulas of xlsx and not read them as NaN?

Question

Here are the things I am trying to do

Read Excel file with formulas into pandas dataframe
Change values in some cell
Evaluate the cell values according to the formulas
Finally, save that file as html

Here's my code

import pandas as pd
import numpy as np
from openpyxl import load_workbook

wb = load_workbook(filename = 'initial.xlsx')
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)
df['x']['C'] = 10
df.to_excel("processing.xlsx", header=False, index=False)

df1 = pd.read_excel("processing.xlsx", sheetname=0, header=0)
df1.to_html('output.html')

With this code I am able to get fully functioning processing.xlsx but if I try to convert it into html the dataframe reads all formulas as NaN.

Question: How to force pandas to evaluate formulas of xlsx and not read them as NaN?

PS: If I try to convert initial.xlsx into html it works fine so something must be wrong in to_excel output.

score 1 · Accepted Answer · edited Oct 06 '22 at 12:41

I found the problem and a messy workaround. Problem:openpyxl does not keep the types intact while saving. The size of processing.xlsx got reduced by 10kb. Therefore, pandas read it as NaN. Therefore, I had to use xlwings which deciphers the types of cell correctly. PS: It is not the best solution, therefore, other Answers are welcome

import pandas as pd, xlwings as xw
def df_from_excel(path):
    app = xw.App(visible=False)
    book = app.books.open(path)
    book.save()
    app.kill()
    return pd.read_excel(path,header=0)

df1 = df_from_excel("1.xlsx")
print(df1.head())
df1.to_html('public\output.html')

How to force pandas to evaluate formulas of xlsx and not read them as NaN?

1 Answers1

Linked