Which is more idiomatic to pandas: Extending a set of lists and creating a dataframe from those lists, or creating a dataframe and appending rows?

Question

Context: I'm currently analyzing a dataset and attempting to utilize the split-apply-combine paradigm. Once I group the dataset I am able to add some columns that wouldn't have meaning before the grouping. I then want to go through these new columns in the grouped dataframes and collect their values in a new, combined dataframe to compute some metrics and provide outputs.

This leads to my question: I can do this by maintaining a set of lists, extending the lists for each grouped dataframe, and then creating a new dataframe at the end from the lists; or I can create a new dataframe to start with and append new rows to the dataframe as I go through the grouped dataframes. Would either of these approaches be considered more "pandorable" than the other?

Thanks!

I'm not sure that either way really screams `pandas`. However whenever you append to a DataFrame it will copy all of your data, which leads to quadratic copying when you use DataFrame.append within a loop. Your code will likely slow to a crawl once things get large. Appending to a list is more efficient so it's advised to do that and `concat` or construct only once after the loop to form the DataFrame: https://stackoverflow.com/a/37009561/4333359 — ALollz, Feb 04 '20 at 22:29
Thanks ALollz - given that, seems maintaining the lists and just making one dataframe at the end is the way to go. Appreciate the help! — nasgold, Feb 05 '20 at 22:21

Which is more idiomatic to pandas: Extending a set of lists and creating a dataframe from those lists, or creating a dataframe and appending rows?

0 Answers0