1

when i go to write my for loop generated dataframes to an excel file, only the last line is written into the excel file. i have tried concatenating the dataframes, as each iteration creates a new data frame and then write it to the excel file.

so essentially what i am looking to do is, to successfully concatenate the data frames into one dataframe before i write them to the excel file. i cant write them in individually using pd.ExcelWriter, as I may have 100's of feature names

def CCC_90(df_1,x):
      for i in x:
            print('------------------------------------------------------------------------------------------------------------------- ')
            feature_num =(df_1.iloc[[i]])
            feature_num_correct = (feature_num + 21)
            print(feature_num_correct)

            writer = pd.ExcelWriter('No3_dVSa.xlsx', engine='xlsxwriter')
            appended_data = []
            for j in feature_num:
                  feature_name = dfFeaturename.iloc[[j]]
                  appended_data.append(feature_name)

            appended_data = pd.concat(appended_data)

            appended_data.to_excel(writer, sheet_name='Sheet1',startrow=1)
            writer.save()
Yuca
  • 6,010
  • 3
  • 22
  • 42
  • 1
    Related: https://stackoverflow.com/questions/28669482/appending-pandas-dataframes-generated-in-a-for-loop – anky Jun 28 '19 at 18:32

1 Answers1

1

The Excel file is overwritten each time in the outlook loop iterations. This means that only the last iteration is saved in the spreadsheet.

To solve this issue, create the writer before the outer loop and save the concated dataframe in the spreadsheet outside of the loop.

Also, the starting row is set to 1 so writing will start from the beginning of the sheet. It'll have to be update corresponding to the number of rows in the appended dataframe.

def CCC_90(df_1,x):
    writer = pd.ExcelWriter('No3_dVSa.xlsx', engine='xlsxwriter')
    startrow = 1

    for i in x:
        # ...
        appended_data.to_excel(writer, sheet_name='Sheet1', startrow=startrow)
        startrow = len(appended_data)
    writer.save()

Also one more thing depending on whether x is a list of indexes, you can iloc with the list to get a dataframe with rows from the list and refactor out the loops.

def CCC_90(df_1,x):
    writer = pd.ExcelWriter('No3_dVSa.xlsx', engine='xlsxwriter')
    # the next line should result in a dataframe with
    # a column containing the feature numbers say fc
    feature_nums = df_1.iloc[x]

    # getting the feature names is more direct
    feature_names = dfFeaturename.iloc[feature_nums.fc.tolist()]
    feature_names.to_excel(writer, sheet_name='Sheet1', startrow=1)
    write.save()
Oluwafemi Sule
  • 36,144
  • 1
  • 56
  • 81
  • Thanks for replying. It's still only printing the last iteration of the loop into the excel file, even after defining the writer and writer.save() outside the first for loop. x is a numpy array. – Michael Moran Jun 28 '19 at 19:46
  • A numpy array can be used in place of the list. You may want to check number of rows in `appended_data` with `print(len(appended_data))`. – Oluwafemi Sule Jun 28 '19 at 20:34
  • 1
    I used a lazy workaround, copied the stdout into a text file, re read it into python and did my analysis on it. Unfortunately I've quite alot of files to be read into pandas data frames and worked on , but thank God for Windows ctrl+_. Haha. Thanks again – Michael Moran Jun 29 '19 at 01:11