0

I have a loop that generates random results every time thanks to my Generate_Dataframe function. So the name columns stay the same but my "Result" column is always a different float value.

def Generate_Dataframe():
 #Do Some Stuff
  return DataFrame  

an example of what I get back would be something like...

DataFrame
Name 1       Name 2        Result
Joe          Smith           5.5
Jake         Smith           4.5
Jim          Smith           2.5
Joanne       Smith           1.5

So when I run my loop I'm generating dataframes like the one above, I want to be able to update/add the Result column each time.

for x in range(1,5):
    New_DataFrame = Generate_DataFrame()

I haven't been able to find a way to store the dataframe. Optimizing for speed would be great. Thanks in advance!

James2k
  • 1
  • 1

2 Answers2

1

IIUC you are using the name columns like indices. You should put them there and then update/add becomes trivial.

def gen_df():
    midx = pd.MultiIndex.from_tuples([
        ('Joe', 'Smith'),
        ('Jake', 'Smith'),
        ('Jim', 'Smith'),
        ('Joanne', 'Smith')
    ], names=['Name 1', 'Name 2'])
    return pd.DataFrame(
        dict(Result=np.random.rand(4)),
        midx
    )

Option 1
You don't have to do it this way. But this is how I'd do it:

from functools import reduce

reduce(pd.DataFrame.add, (gen_df() for _ in range(1, 5)))

                 Result
Name 1 Name 2          
Joe    Smith   2.400550
Jake   Smith   2.222812
Jim    Smith   2.601639
Joanne Smith   0.503774

Option 2
In a loop

df = gen_df()

for _ in range(1, 5):
    df += gen_df()

df

                 Result
Name 1 Name 2          
Joe    Smith   1.998055
Jake   Smith   2.268697
Jim    Smith   2.815204
Joanne Smith   2.253301
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

If you want to store the dataframe I think the best way is to save to a pickle/csv files. df.to_pickle(file_name)/df.to_csv(file_name)

You can read: How to store a dataframe using Pandas

Drza loren
  • 103
  • 1
  • 3
  • 10