What I need the code to do:
- Copy all rows under Columns A - D from 'NewData.xlsx' and update all rows under columns A - D in a copy of 'template.xlsx' called 'updated.xlsx'.
What the code actually does:
- It (successfully!) creates a new template sheet called 'NEW_DATA' in an updated template file called 'updated.xlsx' and writes the data.... ALL IN ONE CELL.
(I need to transfer a large quantity of GPS data into an existing table for work - in case anyone is curious why I am doing this.)
Unlike previous questions I've read, I do not want to modify the number of columns or rows when updating the data, I do not want to paste the data into a new tab, I do not want to copy the whole worksheet or xlsx file and I don't want to append the data below existing data.
import openpyxl
import pandas as pd
# create variable df containing updated data in excel
DataAsXlsx = r'C:\Users\...\NewData.xlsx'
xl_workbook = pd.ExcelFile(DataAsXlsx) # Load the excel workbook
df = xl_workbook.parse("Sheet") # Parse the sheet into a dataframe
#Reads template xlsx, creates template sheet 'NEW_DATA'
template = openpyxl.load_workbook(r'C:\Users\...\template.xlsx')
template.sheetnames
sheet1 = template.worksheets[0]
sheet1.title = 'NEW_DATA'
sheet1 = template['NEW_DATA']
#^^^everything above this line works^^^
#Code below attempts to copy rows AND columns from NewData.xlsx and paste to sheet 'NEW_DATA' in updated.xlsx
for row in range(1, sheet1.max_row+1):
cell = sheet1.cell(row=row, column=1)
if cell.value is not None:
cell.value = str(df)
#This pastes ALL DATA into ColA of sheet 'NEW_DATA' in updated.xlsx
template.save('updated.xlsx')
Here is what NewData.xlsx looks like in Excel:
For the purposes of debugging, template.xlsx can be any existing excel file.
I've already read : Update rows and column using openpyxl from python and it helped with iterating over the template file BUT it uses hard coded data "(c)" and this logic does not transfer to what I need.
I've read almost every question on here regarding pandas and openpyxl and also read the docs. I am at a loss for what to do next.
UPDATE
per Charlie's feedback, I did the following:
from openpyxl import load_workbook
wb1 = load_workbook(r'C:\Users\...\NewData.xlsx')
wb2 = load_workbook(r'C:\Users\...\template.xlsx')
ws1 = wb1['Sheet']
ws2 = wb2.get_active_sheet()
for row in ws1.iter_rows(max_col=4):
values = (c.value for c in row)
ws2.append(values)
ws2.save('updated.xlsx')
And this appends the data at the bottom of the existing data set (it should replace the data in COL A - D) any Suggestions would help - I am so close!!
FINAL UPDATE
HOORAY - this works!!!
import pandas as pd
#use pandas to access the new data
DataAsXlsx = pd.read_excel(r'C:\Users\...\NewData.xlsx', sheet_name='Sheet1')
#this reads the template file
template = r'C:\Users\...\template.xlsx'
df = pd.read_excel(template)
#this creates a new document named FinalAutomatedDataSheet.xlsx
writer = pd.ExcelWriter(r'C:\Users\....\FinalAutomatedDataSheet.xlsx')
#this line overlays the template file data onto FinalAutomatedDataSheet.xlsx
df.to_excel(writer, startcol=0,startrow=0, index=False)
#This line writes the new data to FinalAutomatedDataSheet.xlsx
#NOTE: you can SPECIFY COLUMN and ROW indices below!!:
DataAsXlsx.to_excel(writer, startcol=0,startrow=0, index=False)
writer.save()