0

I have a big pandas DataFrame that I want to expand using the output of a for loop. My for loop creates auxiliary datasets that share a column with the bigger dataframe. Here is an example of the data I am using:

big_df = pd.DataFrame({'id':range(0,4), 'A':list('abcd')})
aux_df = pd.DataFrame({'id':[1,3], 'B':list('xy')})

My desired output would be:

    id  A   B
0   0   a   NaN
1   1   b   x
2   2   c   NaN
3   3   d   y 

The procedure I have been using is to add an empty column 'B' before entering the for loop and fill them using loc and a boolean:

big_df = big_df.reindex(columns = big_df.columns.tolist() + ['B'])
mask = big_df.isin(list(aux_df.id))
big_df.loc[mask,'B'] = aux_df['B']

I read that assignment using .loc is the best procedure, however the loop seems to change only some of the cells.

ZMV
  • 35
  • 4
  • Check your mask with a print because i don't think it does hat you think id does. Otherwise you can also simply concate aux_df to bif_df. – kubatucka Oct 11 '21 at 12:48

1 Answers1

1

Use pd.merge:

>>> pd.merge(big_df, aux_df, on='id', how='left')

   id  A    B
0   0  a  NaN
1   1  b    x
2   2  c  NaN
3   3  d    y
Corralien
  • 109,409
  • 8
  • 28
  • 52