I have a big pandas DataFrame that I want to expand using the output of a for loop. My for loop creates auxiliary datasets that share a column with the bigger dataframe. Here is an example of the data I am using:
big_df = pd.DataFrame({'id':range(0,4), 'A':list('abcd')})
aux_df = pd.DataFrame({'id':[1,3], 'B':list('xy')})
My desired output would be:
id A B
0 0 a NaN
1 1 b x
2 2 c NaN
3 3 d y
The procedure I have been using is to add an empty column 'B' before entering the for loop and fill them using loc and a boolean:
big_df = big_df.reindex(columns = big_df.columns.tolist() + ['B'])
mask = big_df.isin(list(aux_df.id))
big_df.loc[mask,'B'] = aux_df['B']
I read that assignment using .loc is the best procedure, however the loop seems to change only some of the cells.