How do I get this loop to work correctly when writing pandas df to xlsx?

Question

I have used this code, which is kind of working. Right now in the smaller 'rep_list' as it executes the first rep in the list which is CP is adds it, but then when it goes to AM it overwrites the CP. SO right now when I run this code it only actually saves the last person in the loop. If I run the code with just "CP" and then just "AM" it appends it as it should. Is it something wrong with the for loop? or is it an issue with the workbook itself?

import pandas as pd
import datetime
from openpyxl import load_workbook

now = datetime.datetime.now()
currentDate = now.strftime("%Y-%m-%d")
call_report = pd.read_excel("Ending 2016-07-30.xlsx", "raw_data")

#rep_list = ["CP", "AM", "JB", "TT", "KE"]
rep_list = ["CP", "AM"]

def call_log_reader(rep_name):
    rep_log = currentDate + "-" + rep_name + ".csv"
    df = pd.read_csv(rep_log)
    df = df.drop(['From Name', 'From Number', 'To Name / Reference', 'To Number', 'Billing Code', 'Original Dialed Number',
     'First Hunt Group', 'Last Hunt Group'], axis=1)
    df['rep'] = rep_name

    book = load_workbook('Ending 2016-07-30.xlsx')
    writer = pd.ExcelWriter('Ending 2016-07-30.xlsx', engine='openpyxl')
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
    df.to_excel(writer, "raw_data", index=False)
    writer.save()
    ## I tried adding this : writer.close() hoping it would close the book and then force it to reopen for the next rep in the loop but it doesn't seem to work.

for rep in rep_list:
    call_log_reader(rep)

Thank you so much!

EDIT:

Gaurav Dhama gave a great answer that worked excellent. He pointed out that there is a bit of a limitation with the Pandas excelwriter (refer to this link) and proposed a solution in which each rep gets their own sheet in the end. This worked, however after I thought on it I opted against the additional sheets and came up with this solution knowing the limitation existed. Basically, I appended a CSV instead of the actual XLSX file, and then at the end opened that CSV and appended the one big list into the XLSX file. Either one works, just depends on what you're final product looks like.

import pandas as pd
import datetime
from openpyxl import load_workbook

now = datetime.datetime.now()
currentDate = now.strftime("%Y-%m-%d")
call_report = "Ending 2016-07-30.xlsx"
#rep_list = ["CP", "AM", "JB", "TT", "KE"]
rep_list = ["CP", "AM"]
csv_to_xl_files = []
merged_csv = currentDate + "-master.csv"

def call_log_reader(rep_name):
    rep_log = currentDate + "-" + rep_name + ".csv"
    df = pd.read_csv(rep_log)
    df = df.drop(['TimestampDetail', 'Billing Code', 'From Name', 'From Number', 'To Name / Reference', 'To Number',
              'Original Dialed Number', 'First Hunt Group', 'Last Hunt Group'], axis=1)
    df['rep'] = rep_name
    #print (df.head(3))
    df.to_csv(merged_csv, mode='a', index=False, header=False)
    csv_to_xl_files.append(rep_log)

book = load_workbook(call_report)
writer = pd.ExcelWriter(call_report, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

for rep in rep_list:
    call_log_reader(rep)

master_df = pd.read_csv(merged_csv)
master_df.to_excel(writer, "raw_data", index=False)
writer.save()

#this csv_to_xl_files list isn't finished yet, basically I'm going to use it to delete the files from the directory as I don't need them once the script is run.
print (csv_to_xl_files)

Gaurav Dhama · Accepted Answer · 2016-08-03T15:50:39.303

1

Try using the following:

import pandas as pd
import datetime
from openpyxl import load_workbook

now = datetime.datetime.now()
currentDate = now.strftime("%Y-%m-%d")
call_report = pd.read_excel("Ending 2016-07-30.xlsx", "raw_data")

#rep_list = ["CP", "AM", "JB", "TT", "KE"]
rep_list = ["CP", "AM"]

def call_log_reader(rep_name):
    rep_log = currentDate + "-" + rep_name + ".csv"
    df = pd.read_csv(rep_log)
    df = df.drop(['From Name', 'From Number', 'To Name / Reference', 'To Number', 'Billing Code', 'Original Dialed Number',
     'First Hunt Group', 'Last Hunt Group'], axis=1)
    df['rep'] = rep_name
    df.to_excel(writer, "raw_data"+rep, index=False)
    return df

book = load_workbook('Ending 2016-07-30.xlsx')
writer = pd.ExcelWriter('Ending 2016-07-30.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

for rep in rep_list:
    call_log_reader(rep)

writer.save()

edited Aug 03 '16 at 15:50

answered Aug 02 '16 at 23:00

Gaurav Dhama

1,346
8
19

I tried it a few different ways just now, if I put the above block of code before I call the "For" loop it will run without any errors but the file just has a blank sheet for 'raw_data'. If I put the block of code after the for loop OR in the for loop after the function is called it throws a NameError: global name 'writer' is not defined and won't complete – Mxracer888 Aug 02 '16 at 23:09
I have edited my answer and have tested a similar code. This should work for you. Also in case it doesnt work, please post a sample of "Ending 2016-07-30.xlsx" from which you are reading the variable call_report – Gaurav Dhama Aug 02 '16 at 23:20
Ahhh, there is a issue in pandas excelwriter, you cannot write different dataframes in the same sheet, they will keep overwriting one after the other. Refer this [link](https://github.com/pydata/pandas/issues/3441). I have edited the code again to generate multiple worksheets. – Gaurav Dhama Aug 02 '16 at 23:28
Interesting. I did notice if I ran the code with one rep. then ran it again with another and so on it would work, which I was hoping to essentially replicate in doing the writer.close() I will edit this and test it out tomorrow on the project computer! – Mxracer888 Aug 03 '16 at 04:14
Alright, I tried that and it mostly worked. I had to move the "df.to_excel" statement into the function and out of the for loop otherwise it had an error saying "NameError: name 'df' is not defined (even with the return statement). At any rate, update that and I'll mark it as accepted answer :) – Mxracer888 Aug 03 '16 at 15:46

score 0 · Answer 2 · answered Aug 03 '16 at 07:07

0

If you use openpyxl 2.4 then you can work with Pandas dataframes in directly in openpyxl.

answered Aug 03 '16 at 07:07

Charlie Clark

18,477
4
49
55

How do I get this loop to work correctly when writing pandas df to xlsx?

2 Answers2