Suppose my data.xlsx
's first sheet contains some computed columns.
I'm trying to pull out a pd.DataFrame
of that sheet that holds the computed values.
But try as I may, I cannot achieve this.
Fails:
# > pip install openpyxl
import pandas as pd
pd.read_excel(f'data.xlsx', 'firstSheetName')
# NOTE: Adding `, engine='openpyxl'` makes no difference
df_nodal.head()
This gives NaN in all calculate fields.
xl = pd.ExcelFile(f'data.xlsx')
df = xl.parse('firstSheetName')
df.head()
Same.
how to read xlsx as pandas dataframe with formulas as strings
from openpyxl import load_workbook
wb = load_workbook(filename = f'data.xlsx')
ws = wb['mySheetName']
df = pd.DataFrame(ws.values)
df.head()
Now this is giving the formulae: =H2
, =H3
etc. in the cells.
An attempt to 'type-convert' these colums failed:
df[12][2:].astype(float)
# ValueError: could not convert string to float: '=H3'
How to force pandas to evaluate formulas of xlsx and not read them as NaN? might offer a solution, which involves saving and reloading the .xlsx. However I can't get it working. That syntax appears invalid.
import pandas as pd, xlwings as xw
def df_from_excel(path):
book = xw.Book(path)
book.save()
return pd.read_excel(path,header=0)
df = df_from_excel('nodal0.xlsx')
This gives XlwingsError: Make sure to have "appscript" and "psutil", dependencies of xlwings, installed.
And pip install appscript psutil
says they're both already installed.
Note: Same idea here: Pandas read_excel with formulas and get values
I'm trying to find a way for it to render into a dataframe, which will then contain numeric values.
Is there any way to do it?
EDIT:
Here's what I'm dealing with:
The raw .xlsx is shown below. I've double-clicked a calculated cell revealing the underlying =H2
.
Notice the corresponding cell of the dataframe (generated from this .xlsx) is showing NaN