-1

I have a dataframe which I am currently splitting into groups and then looping through it:

dfs = [group for _, group in df.groupby(by=group)]

df = dfs[0]
for i in range(len(dfs)):
    if i > 0:
        df = df.merge(dfs[i], how='outer', on=col_data)
        del dfs[i]


Which works fine for a small number of dataframes on the list, but gets quite slow and even crashes because of memory for larger lists. I tried deleting each item as I go through but no success.

For example, I have a list of 643 dfs.

Maybe pandas is not the best solution and I should be using Numpy instead, but I am not quite sure how to do so.

Metro
  • 1
  • 1
  • It would help to see a sample input and expected output. Why are you splitting the dataframe in the first place just to put it back together? could you use `pivot_table`, `transpose` or `wide_to_long` for example to achieve the desired result? – G. Anderson Sep 21 '20 at 22:26
  • @G.Anderson, I have an original df of item, price and date in each column. So one row for each item for each date. I want to have a dataframe with index being dates and one column for each item containing its price for that day. – Metro Sep 21 '20 at 22:30
  • In that case I can almost guarantee that a group-split-loop-merge process is way more overhead than you need. If you can [edit] your question to provide a sample like [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) then we can help you better – G. Anderson Sep 22 '20 at 16:06

1 Answers1

0

Turns out the issue was not the method itself, but rather I had multiple duplicate "keys" to merge on for each dataframe.

Metro
  • 1
  • 1