0

I have two dataframes df1 and df2. df1 as two columns with distinct values across multiple rows. It looks similar to the following:

col_a col_b
aa 50
bb 132
dd 543

df2 has similar structure to the following:

col_a col__b col_x col_y col_z
aa xy xy xy 2
aa xy xy xy 3
bb xy xy xy 14
bb xy xy xy 9
bb xy xy xy 6
cc xy xy xy 0
cc xy xy xy 2
dd xy xy xy 0
dd xy xy xy 2

I want to filter df2 through a loop using values from df1 in a way that after first iteration of loop I end up with the follow:

col_a col_b col_x col_y col_z
aa xy xy xy 2
aa xy xy xy 3

And after second iteration I should have the following:

col_a col_b col_x col_y col_z
bb xy xy xy 14
bb xy xy xy 9
bb xy xy xy 6

I do not want the filtering to modify the df2 permanently. Subsequent iterations should filter df2 based on values from col_a of df1.

I have no idea how to achieve this so I would appreciate any help.

not_speshal
  • 22,093
  • 2
  • 15
  • 30
Excel-lit
  • 71
  • 9
  • Does this answer your question? [Splitting dataframe into multiple dataframes](https://stackoverflow.com/questions/19790790/splitting-dataframe-into-multiple-dataframes) – Anurag Dabas Jul 21 '21 at 16:48
  • see this answer https://stackoverflow.com/a/51405971/14289892 – Anurag Dabas Jul 21 '21 at 16:48
  • @AnuragDabas not quite because although I want to filter dataframe. I don't split the dataframe. Instead I just want to filter the dataframe temporarily so I can do some arithmetic on its columns then move to second set to values from dataframe2 and do the same again. – Excel-lit Jul 21 '21 at 16:53
  • so you want to filter df2 on the basis of col_a of df1? – Anurag Dabas Jul 21 '21 at 17:10

1 Answers1

1

You can iterate through your unique column values, then filter your df2 based on each unique value. You don't need drop_duplicates if df1 values are unique, but I include it just in case in this toy example:

df1 = pd.DataFrame({'col_a':['aa', 'bb']})
df2 = pd.DataFrame({'col_a':['aa', 'aa', 'bb', 'bb'], 'other column':[1, 2, 3, 4]})

for col_value in df1['col_a'].drop_duplicates():
    df_temp_filtered = df2[df2['col_a'] == col_value]

to get this df_temp_filtered in round 2:

  col_a  other column
2    bb             3
3    bb             4
Hammurabi
  • 1,141
  • 1
  • 4
  • 7