4

I have an excel file with several sheets, say, Data 1, Data 2 and Pivots.

The sheets Data 1 and Data 2 have one table each. The sheet Pivots has only pivot tables whose data sources are the tables on Data 1 and Data 2.

What I'd like to do is to rewrite the data sheets with data in two dataframes, say df1 and df2 respectively, while keeping the pivot tables linked to the same sheets. The idea is to run a script, replace Data 1 and Data 2 and refresh the pivot tables to get updated data.

pd.ExcelWriter(xlsx_file) won't work because it replaces the file. The approach below was adapted from this answer.

import pandas as pd
from openpyxl import load_workbook

book=load_workbook(xlsx_file)
writer = pd.ExcelWriter(xlsx_file, engine = "openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
writer.sheets.pop("Pivots", None)
data_sheets = ["Data 1", "Data 2"]

for (k, df) in enumerate([df1, df2]):
    df.to_excel(writer, data_sheets[k], index=False)
writer.save()

This failed because the sheet Pivots indeed kept the data in the cells, but the pivot table was gone, along with all its formatting. It was just hard values.

I also perused this question and this question, but couldn't make it work.

How do I go about doing this simple task?


I uploaded an example file which can be download here. For your convenience, here are two dataframes to replace the data sheets:

df1 = pd.DataFrame({"Category": ["A", "B", "C", "D", "A"], "Value": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({"SKU": ["AB", "BB", "CB", "DB", "AB"], "No of Items": [3, 2, 7, 4, 12]})

As asked by a user below, I'm leaving here my failed try at his solution proposal (the pivots and all their formatting are gone, just their hardvalues remain).

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

df1 = pd.DataFrame([["A", 1], ["B", 2], ["C", 3], ["D", 4], ["A", 5]], columns=["Category", "Value"])
df2 = pd.DataFrame([["AB", 3], ["BB", 2], ["CB", 7], ["DB", 4], ["AB", 12]], columns=["SKU", "No of Items"])

wb = load_workbook("xlsx_file.xlsx")
sheets = ["Data 1", "Data 2"]

for (idx, df) in enumerate([df1, df2]):
    ws = wb.get_sheet_by_name(sheets[idx])
    rows = dataframe_to_rows(df)
    for (r_idx, row) in enumerate(rows):
        if r_idx != 0:
            for (c_idx, value) in enumerate(row[1:]):
                ws.cell(row=r_idx+1, column=c_idx+1, value=value)

wb.save("xlsx_file.xlsx")
Git Gud
  • 217
  • 3
  • 10

1 Answers1

0

The pandas side of things knows nothing about pivots so you should avoid using to_excel() and use the utilities that openpyxl provides for going from a dataframe to a worksheet and back.

You may need to change the definition of the pivot tables in which case you are largely on your own: openpyxl should preserve the structure but provides no additional functions for managing them. You will have to rely on the specification of pivot tables in ECMA 376 / ISO 29500

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • I tried using [this](https://stackoverflow.com/a/36664027/7454646) answer of yours, but it still broke the pivot table. Any idea of what I might be doing wrong? – Git Gud Mar 16 '18 at 11:47
  • A link to the excel file is in the question. The code I used is there too. Thank you very much. – Git Gud Mar 16 '18 at 19:00