I have a large data set that I pull into pandas with read-excel. I use the data to create a new column then write the new column to Excel with openpyxl. The issue is that if I read that file again, the original data will be read as blank values. The only way around this I have found so far is to re-write the original array. Why does this happen?
Update 1: This is true even if I change the pd.read_excel(engine='openpyxl')
Update 2: To clarify here are examples:
test.xlsm
import pandas as pd
import openpyxl as xl
from openpyxl.utils.dataframe import dataframe_to_rows
df = pd.read_excel('test.xlsm', engine='openpyxl')
print(df)
df['Col_4'] = df['Col_3'].apply(lambda x: x*2)
wb = xl.load_workbook('test.xlsm')
ws = wb.active
df = df['Col_4'].to_frame()
rowsdf = dataframe_to_rows(df, index = False, header = False)
for row_indexdf, rowdf in enumerate(rowsdf, 2):
for column_indexdf, valuedf in enumerate(rowdf, 4):
ws.cell(row=row_indexdf, column = column_indexdf, value = valuedf)
wb.save('test.xlsm')
First time running:
Col_1 Col_2 Col_3
0 1 6 11
1 2 7 12
2 3 8 13
Second time running:
Col_1 Col_2 Col_3 Unnamed: 3
0 1 NaN NaN 22
1 2 NaN NaN 24
2 3 NaN NaN 26
Why are the second two columns now blank? I did not write over them...I do not think.