1

I have a for loop that loops through a set of students in a dataframe and gets back their ID and Attendance:

for i in FinalDataset.index:
    student = FinalDataset.loc[i,'students']
    AttendanceSoFar = FinalDataset.loc[i,'OverallAttCurrent']

    data = [[student, AttendanceSoFar]]
    InsertionData = pd.DataFrame(data, columns=['student', 'AttendanceSoFar'])
    InsertionData.to_excel("Result.xlsx")

I then put these details in a dataframe called InsertionData and create an excel spreadsheet with that data in. However currently it creates multiple spreadsheets. I just want to add a new row each time it loops. Does anyone know how to do this?

P.S - In my actual code I am also performing multiple calculations and adding new variables, which is why I can't just delete columns in the current dataframe and use that.

  • The `to_excel` is in the `for` loop so you're creating as many file as you're iterating. You only need to add data in the for loop and take out the `to_excel` from it. – Alexandre B. Apr 13 '20 at 10:01
  • Thanks @Alexandre B. So, how to I add each row into the insertiondata dataframe? At the moment it just replaces each row with the new one –  Apr 13 '20 at 10:05
  • I do not really understand what is the purpose of the for loop ? Filtering columns ? If it is, have a look at [Selecting multiple columns in a pandas dataframe](https://stackoverflow.com/questions/11285613/selecting-multiple-columns-in-a-pandas-dataframe) – Alexandre B. Apr 13 '20 at 10:09
  • @Alexandre B. In my actual code I am doing some machine learning so it is very complicated. However I just wanted to know simply how I can iterate through a for loop and add each output into a dataframe/excel file without overwriting. –  Apr 13 '20 at 10:15

1 Answers1

0

The problem is well explained in this tutorial: How to add rows in a DataFrame using dataframe.append() & loc[] , iloc[] or Add one row to pandas DataFrame

Here is a simple example from your snippet:

# Create output dataframe
OutputData = pd.DataFrame(columns=['student', 'AttendanceSoFar'])
for i in FinalDataset.index:
    # computations ...
    # Select student and OverallAttCurrent
    student = FinalDataset.loc[i, 'students']
    AttendanceSoFar = FinalDataset.loc[i, 'OverallAttCurrent']

    # Add the data in the output
    OutputData = OutputData.append(
        {"student": student, "AttendanceSoFar": AttendanceSoFar}, ignore_index=True)

# Export to excel
OutputData.to_excel("Result.xlsx")

Hope that helps!

Alexandre B.
  • 5,387
  • 2
  • 17
  • 40