Why would Pandas return blank cells after the dataframe is saved by Openpyxl?

Question

I have a large data set that I pull into pandas with read-excel. I use the data to create a new column then write the new column to Excel with openpyxl. The issue is that if I read that file again, the original data will be read as blank values. The only way around this I have found so far is to re-write the original array. Why does this happen?

Update 1: This is true even if I change the pd.read_excel(engine='openpyxl')

Update 2: To clarify here are examples:

test.xlsm

import pandas as pd
import openpyxl as xl
from openpyxl.utils.dataframe import dataframe_to_rows

df = pd.read_excel('test.xlsm', engine='openpyxl')

print(df)

df['Col_4'] = df['Col_3'].apply(lambda x: x*2)

wb = xl.load_workbook('test.xlsm')
ws = wb.active
df = df['Col_4'].to_frame()
rowsdf = dataframe_to_rows(df, index = False, header = False)
for row_indexdf, rowdf in enumerate(rowsdf, 2):
    for column_indexdf, valuedf in enumerate(rowdf, 4):
        ws.cell(row=row_indexdf, column = column_indexdf, value = valuedf)

wb.save('test.xlsm')

First time running:

   Col_1  Col_2  Col_3
0      1      6     11
1      2      7     12
2      3      8     13

Second time running:

   Col_1  Col_2  Col_3  Unnamed: 3
0      1    NaN    NaN          22
1      2    NaN    NaN          24
2      3    NaN    NaN          26

Why are the second two columns now blank? I did not write over them...I do not think.

Why do you start at index 2 and 4 where you use `enumerate`? — BeRT2me, Apr 12 '22 at 00:18
Because I am trying to have it write into column D in Excel starting in D2 so the data lines up. — Trippingbillies41, Apr 12 '22 at 00:23
if you only need to write the data to col4, why are you iterating on all the columns here - "for column_indexdf, valuedf in enumerate(rowdf, 4):" ? — Ashish Samarth, Apr 12 '22 at 03:21
Figured out the issue, formula's aren't evaluated until excel is opened. Answer can be seen [here](https://stackoverflow.com/a/59618659/11865956) — BeRT2me, Apr 12 '22 at 04:44

score 0 · Accepted Answer · answered Apr 12 '22 at 16:43

0

Neither openpyxl nor pandas evaluate excel formulas. Opening a file in pandas is equivalent to opening the file with openpyxl in data_only=True mode.

So here, when you save the file that you've written, a bunch of NaNs appear because excel never had the chance to evaluate the changed sheet.

Interesting enough, if you open the file with openpyxl and further investigate one of the NaN cells, the expected formula is still there!

answered Apr 12 '22 at 16:43

BeRT2me

12,699
2
13
31

And how to evaluate this formula and read values? Note: I don't have excel installed and I only want to read value – Pritesh Gohil Jul 10 '23 at 19:26
1

There is no value until it is evaluated by Excel/an Excel equivalent. – BeRT2me Jul 12 '23 at 20:47

Why would Pandas return blank cells after the dataframe is saved by Openpyxl?

1 Answers1