2

I have a process that creates a dataframe of almost 1,000 rows that runs each week. I would like to be able to append to an existing sheet without having to re-read the spreadsheet because that will take a long time as the file grows. I saw this answer here: Append existing excel sheet with new dataframe using python pandas. Unfortunately, it doesn't seem to be working correctly for me. Here is some dummy code that I am trying to append to that existing file. It causes two issues at present - first, it does not append, but rather overwrites the data. Secondly, when I go to open the file, even after the program runs, it only allows me to open it in read-only mode. I have confirmed I am using pandas 1.4 as well.

import pandas as pd


data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)


filename = "Testing Append Process.xlsx"
writer = pd.ExcelWriter(filename, engine="openpyxl", mode="a", if_sheet_exists="overlay")

df.to_excel(writer, index=False)
writer.save()
Shawn Schreier
  • 780
  • 2
  • 10
  • 20
  • You can't append to an existing sheet without having to re-read the spreadsheet. That isn't how `df.to_excel()` works. – BoarGules Apr 22 '22 at 14:12
  • If that is the case, can you explain how to accomplish what I'm trying to do, or what part of the if_sheet_exists I am misunderstanding? I essentially don't want to read the whole data set into memory each time, but if there is a simpler or faster way to do it, that would be great. – Shawn Schreier Apr 22 '22 at 14:18
  • 1
    Excel files are rather complicated zipfiles, so appending to them is not as simple as you might wish. If you want to do more sophisticated manipulation of Excel files you might look at module `openpyxl`. But this will involve more programming than you may want. – BoarGules Apr 22 '22 at 14:25

3 Answers3

2

Please notes that Testing Append Process.xlsx file has to be created before running this code.

from openpyxl import load_workbook
import pandas as pd

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)

filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = workbook
writer.sheets = {ws.title: ws for ws in workbook.worksheets}

df.to_excel(writer, startrow=writer.sheets['Sheet1'].max_row, index = False, header= False)

writer.close()

Returns the following if you will run the code twice.

enter image description here

gremur
  • 1,645
  • 2
  • 7
  • 20
  • This solution now gives a FutureWarning message for writer.book attribute. How would you adjust your code to satisfy the future removal of writer.book? – Sean Richards Jan 10 '23 at 16:14
  • @SeanRichards. Thank you for the comment. Please take a look at the my alternative solution bellow. – gremur Jan 13 '23 at 09:43
1

The alternative solution above now gives a FutureWarning message for writer.book attribute. So here is the new solution.

from openpyxl import load_workbook

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}

filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
worksheet = workbook.active

for record in list(zip(*data.values())):
    worksheet.append(record)
    
workbook.save(filename)

Returns the following if you will run the code twice.

enter image description here

gremur
  • 1,645
  • 2
  • 7
  • 20
  • Do you also have a solution if the data is a dataframe? – Henry Jan 13 '23 at 17:40
  • 1
    @Henry, well pandas dataframe can be converted to dict [pandas.DataFrame.to_dict](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html) in order to use the solution above – gremur Jan 13 '23 at 17:50
0

Here is a way to limit how much of the spreadsheet is read, but with openpyxl.

Getting setup: openpyxl documentation

max_rows = max((c.row for c in active_ws['A'] if c.value is not None)) + 1

This will return the max amount of rows within the document... add one to it, and you will have your starting point to start placing the data you need to add.

Here is an example of placing that data with openpyxl:

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}

count = 0 
for x in range(4):
    name = data['Name'][x]
    age = data['Age'][x]

    active_ws.cell(row=max_row+count, column=1).value = name
    active_ws.cell(row=max_row+count, column=2).value = name

    count += 1

Edit: The max rows var can be adjusted to any column you want checked. (This one checks column "A")... The column cannot have an empty cells until your data is done otherwise, it will give you an incorrect "max rows".

BMarshyyy
  • 71
  • 3