I have an excel file with several sheets, say, Data 1
, Data 2
and Pivots
.
The sheets Data 1
and Data 2
have one table each.
The sheet Pivots
has only pivot tables whose data sources are the tables on Data 1
and Data 2
.
What I'd like to do is to rewrite the data sheets with data in two dataframes, say df1
and df2
respectively, while keeping the pivot tables linked to the same sheets. The idea is to run a script, replace Data 1
and Data 2
and refresh the pivot tables to get updated data.
pd.ExcelWriter(xlsx_file)
won't work because it replaces the file.
The approach below was adapted from this answer.
import pandas as pd
from openpyxl import load_workbook
book=load_workbook(xlsx_file)
writer = pd.ExcelWriter(xlsx_file, engine = "openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
writer.sheets.pop("Pivots", None)
data_sheets = ["Data 1", "Data 2"]
for (k, df) in enumerate([df1, df2]):
df.to_excel(writer, data_sheets[k], index=False)
writer.save()
This failed because the sheet Pivots
indeed kept the data in the cells, but the pivot table was gone, along with all its formatting. It was just hard values.
I also perused this question and this question, but couldn't make it work.
How do I go about doing this simple task?
I uploaded an example file which can be download here. For your convenience, here are two dataframes to replace the data sheets:
df1 = pd.DataFrame({"Category": ["A", "B", "C", "D", "A"], "Value": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({"SKU": ["AB", "BB", "CB", "DB", "AB"], "No of Items": [3, 2, 7, 4, 12]})
As asked by a user below, I'm leaving here my failed try at his solution proposal (the pivots and all their formatting are gone, just their hardvalues remain).
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
df1 = pd.DataFrame([["A", 1], ["B", 2], ["C", 3], ["D", 4], ["A", 5]], columns=["Category", "Value"])
df2 = pd.DataFrame([["AB", 3], ["BB", 2], ["CB", 7], ["DB", 4], ["AB", 12]], columns=["SKU", "No of Items"])
wb = load_workbook("xlsx_file.xlsx")
sheets = ["Data 1", "Data 2"]
for (idx, df) in enumerate([df1, df2]):
ws = wb.get_sheet_by_name(sheets[idx])
rows = dataframe_to_rows(df)
for (r_idx, row) in enumerate(rows):
if r_idx != 0:
for (c_idx, value) in enumerate(row[1:]):
ws.cell(row=r_idx+1, column=c_idx+1, value=value)
wb.save("xlsx_file.xlsx")