2

What I need the code to do:

  • Copy all rows under Columns A - D from 'NewData.xlsx' and update all rows under columns A - D in a copy of 'template.xlsx' called 'updated.xlsx'.

What the code actually does:

  • It (successfully!) creates a new template sheet called 'NEW_DATA' in an updated template file called 'updated.xlsx' and writes the data.... ALL IN ONE CELL.

(I need to transfer a large quantity of GPS data into an existing table for work - in case anyone is curious why I am doing this.)

Unlike previous questions I've read, I do not want to modify the number of columns or rows when updating the data, I do not want to paste the data into a new tab, I do not want to copy the whole worksheet or xlsx file and I don't want to append the data below existing data.

import openpyxl
import pandas as pd

# create variable df containing updated data in excel
DataAsXlsx = r'C:\Users\...\NewData.xlsx'
xl_workbook = pd.ExcelFile(DataAsXlsx)  # Load the excel workbook
df = xl_workbook.parse("Sheet")  # Parse the sheet into a dataframe

#Reads template xlsx, creates template sheet 'NEW_DATA'
template = openpyxl.load_workbook(r'C:\Users\...\template.xlsx')
template.sheetnames
sheet1 = template.worksheets[0]
sheet1.title = 'NEW_DATA'
sheet1 = template['NEW_DATA']

#^^^everything above this line works^^^


#Code below attempts to copy rows AND columns from NewData.xlsx and paste to sheet 'NEW_DATA' in updated.xlsx

for row in range(1, sheet1.max_row+1): 
   cell = sheet1.cell(row=row, column=1)
   if cell.value is not None:
        cell.value = str(df)

#This pastes ALL DATA into ColA of sheet 'NEW_DATA' in updated.xlsx

template.save('updated.xlsx')

Here is what NewData.xlsx looks like in Excel:

what NewData.xlsx looks like in excel

For the purposes of debugging, template.xlsx can be any existing excel file.

I've already read : Update rows and column using openpyxl from python and it helped with iterating over the template file BUT it uses hard coded data "(c)" and this logic does not transfer to what I need.

I've read almost every question on here regarding pandas and openpyxl and also read the docs. I am at a loss for what to do next.

UPDATE

per Charlie's feedback, I did the following:

from openpyxl import load_workbook

wb1 = load_workbook(r'C:\Users\...\NewData.xlsx')
wb2 = load_workbook(r'C:\Users\...\template.xlsx')
ws1 = wb1['Sheet']
ws2 = wb2.get_active_sheet() 

for row in ws1.iter_rows(max_col=4):
        values = (c.value for c in row)
        ws2.append(values)
ws2.save('updated.xlsx')

And this appends the data at the bottom of the existing data set (it should replace the data in COL A - D) any Suggestions would help - I am so close!!

FINAL UPDATE

HOORAY - this works!!!

import pandas as pd

#use pandas to access the new data 
DataAsXlsx = pd.read_excel(r'C:\Users\...\NewData.xlsx', sheet_name='Sheet1')

#this reads the template file
template = r'C:\Users\...\template.xlsx'
df = pd.read_excel(template)

#this creates a new document named FinalAutomatedDataSheet.xlsx
writer = pd.ExcelWriter(r'C:\Users\....\FinalAutomatedDataSheet.xlsx') 

#this line overlays the template file data onto FinalAutomatedDataSheet.xlsx
df.to_excel(writer, startcol=0,startrow=0, index=False)

#This line writes the new data to FinalAutomatedDataSheet.xlsx
#NOTE: you can SPECIFY COLUMN and ROW indices below!!:
DataAsXlsx.to_excel(writer, startcol=0,startrow=0, index=False)

writer.save()
Needa Mirza Lin
  • 45
  • 1
  • 10
  • Here `cell.value = str(df)` you should specify the column and row of the `df` too – Joe Aug 08 '18 at 05:25
  • Can you provide an example on how to specify the column and row of the 'df'? This is what I cannot find an example for in the docs or anywhere on StackOverflow. I am just trying to copy data from columns A-D from NewData.xlsx to columns A - D in Updated.xlsx. There has to be a simple way to do this. I assume it'll be something like: `df.loc[row_indexer,column_indexer]` source: https://pandas.pydata.org/pandas-docs/stable/indexing.html – Needa Mirza Lin Aug 08 '18 at 16:39

1 Answers1

2

Your current code attempts to stick and entire dataframe into a cell.

If you're just copying between worksheets then I would suggest you use openpyxl's read-only mode to read the data.

from openpyxl import load_workbook
wb1 = load_workbook(read_only=True)
wb2 = load_workbook(template)
ws1 = wb1['Sheet']
ws2 = wb2.create_sheet("NEW_DATA") # it's not quite clear what you want
for row in ws1.iter_rows(max_col=4):
    values = (c.value for c in row)
    ws2.append(values)
Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • I am copying data from one excel document to another, not between worksheets of the same document. My apologies if this was not clear. The new excel document 'updated.xlsx' is a copy of 'template.xlsx'. The line `ws2 = wb2.create_sheet("NEW_DATA")` creates a sheet in updated.xlsx. This part of the code works great. I agree that the for loop is not adding data for each cell, but for the whole dataframe. When I tried to implement your solution to see what it does, I got a syntax error for the 'for loop' line. Is it possible to provide another example that iterates over each row and column? – Needa Mirza Lin Aug 08 '18 at 16:31
  • Maybe these answers can help https://stackoverflow.com/questions/51739488/how-to-copy-a-range-from-one-sheet-to-another-as-values-using-openpyxl-in-python/51739562#51739562 – Joe Aug 08 '18 at 17:21
  • @joe I tried the following from the link you provided: `for row in df['A1':'Z100']: for cell in row: sheet1[cell.coordinate].value = cell.value template.save('updated.xlsx')` ...and got the following error: *TypeError: cannot do slice indexing on with these indexers [A1] of .* I want to iterate over the `range(len(df))` until there are no more cells. I would rate my Python skills as a solid beginner so I apologize if the solution is obvious. Let me know if I need to add this to the question so its easier to read. – Needa Mirza Lin Aug 08 '18 at 20:43
  • Also note that the data is mixed-type containing: strings, symbols and numbers. I think this also might be contributing to the issue? – Needa Mirza Lin Aug 08 '18 at 20:50
  • Thanks! It is still appending data at the bottom - not replacing it. Any suggestion on fixing it to replace, not append, the data? I tried other commands for .append() and nothing seems to work (something like .update() or .replace() but both throw an error) – Needa Mirza Lin Aug 09 '18 at 16:53
  • If you want to replace cells then you will have to use `ws.cell()` but this sort of defeats the purpose of the template and wasn't clear in your post. Better to delete the rows in the template so that you can use append. – Charlie Clark Aug 09 '18 at 17:50
  • Thanks for the feedback. Does ws.cell() override the existing data? or does it only write to empty cells? The template document has existing GPS data that will be updated periodically and only columns A-D need to be written over by NewData.xlsx. Currently the data is copied and pasted manually which is error-prone and tedious. I am trying to automate that process. – Needa Mirza Lin Aug 09 '18 at 20:30
  • also when I tried: `ws2.cell(values)` I got the following message: TypeError: cell() takes at least 3 arguments (2 given). i want to copy all data and the exampe `d = ws.cell(row=4, column=2, value=10)` is too specific from https://openpyxl.readthedocs.io/en/stable/tutorial.html – Needa Mirza Lin Aug 09 '18 at 20:32
  • You need to work your way through the tutorial to familiarise yourself with how the library works. – Charlie Clark Aug 10 '18 at 08:09
  • Figured it out! Thank you for the tips along the way - I accepted your response as the answer. I updated the question with the solution that worked for anyone else that may need it. – Needa Mirza Lin Aug 10 '18 at 23:48