0

Context: I'm currently analyzing a dataset and attempting to utilize the split-apply-combine paradigm. Once I group the dataset I am able to add some columns that wouldn't have meaning before the grouping. I then want to go through these new columns in the grouped dataframes and collect their values in a new, combined dataframe to compute some metrics and provide outputs.

This leads to my question: I can do this by maintaining a set of lists, extending the lists for each grouped dataframe, and then creating a new dataframe at the end from the lists; or I can create a new dataframe to start with and append new rows to the dataframe as I go through the grouped dataframes. Would either of these approaches be considered more "pandorable" than the other?

Thanks!

nasgold
  • 11
  • 2
  • I'm not sure that either way really screams `pandas`. However whenever you append to a DataFrame it will copy all of your data, which leads to quadratic copying when you use DataFrame.append within a loop. Your code will likely slow to a crawl once things get large. Appending to a list is more efficient so it's advised to do that and `concat` or construct only once after the loop to form the DataFrame: https://stackoverflow.com/a/37009561/4333359 – ALollz Feb 04 '20 at 22:29
  • 1
    Thanks ALollz - given that, seems maintaining the lists and just making one dataframe at the end is the way to go. Appreciate the help! – nasgold Feb 05 '20 at 22:21

0 Answers0