Filter one dataframe using another through a loop pandas

Question

I have two dataframes df1 and df2. df1 as two columns with distinct values across multiple rows. It looks similar to the following:

col_a	col_b
aa	50
bb	132
dd	543

df2 has similar structure to the following:

col_a	col__b	col_x	col_y	col_z
aa	xy	xy	xy	2
aa	xy	xy	xy	3
bb	xy	xy	xy	14
bb	xy	xy	xy	9
bb	xy	xy	xy	6
cc	xy	xy	xy	0
cc	xy	xy	xy	2
dd	xy	xy	xy	0
dd	xy	xy	xy	2

I want to filter df2 through a loop using values from df1 in a way that after first iteration of loop I end up with the follow:

col_a	col_b	col_x	col_y	col_z
aa	xy	xy	xy	2
aa	xy	xy	xy	3

And after second iteration I should have the following:

col_a	col_b	col_x	col_y	col_z
bb	xy	xy	xy	14
bb	xy	xy	xy	9
bb	xy	xy	xy	6

I do not want the filtering to modify the df2 permanently. Subsequent iterations should filter df2 based on values from col_a of df1.

I have no idea how to achieve this so I would appreciate any help.

Does this answer your question? [Splitting dataframe into multiple dataframes](https://stackoverflow.com/questions/19790790/splitting-dataframe-into-multiple-dataframes) — Anurag Dabas, Jul 21 '21 at 16:48
see this answer https://stackoverflow.com/a/51405971/14289892 — Anurag Dabas, Jul 21 '21 at 16:48
@AnuragDabas not quite because although I want to filter dataframe. I don't split the dataframe. Instead I just want to filter the dataframe temporarily so I can do some arithmetic on its columns then move to second set to values from dataframe2 and do the same again. — Excel-lit, Jul 21 '21 at 16:53

score 1 · Accepted Answer · answered Jul 21 '21 at 16:52

You can iterate through your unique column values, then filter your df2 based on each unique value. You don't need drop_duplicates if df1 values are unique, but I include it just in case in this toy example:

df1 = pd.DataFrame({'col_a':['aa', 'bb']})
df2 = pd.DataFrame({'col_a':['aa', 'aa', 'bb', 'bb'], 'other column':[1, 2, 3, 4]})

for col_value in df1['col_a'].drop_duplicates():
    df_temp_filtered = df2[df2['col_a'] == col_value]

to get this df_temp_filtered in round 2:

  col_a  other column
2    bb             3
3    bb             4

Thank you for your prompt reply and helping me solve this problem. Much appreciated!! — Excel-lit, Jul 22 '21 at 08:21

col_a	col__b	col_x	col_y	col_z
aa	xy	xy	xy	2
aa	xy	xy	xy	3
bb	xy	xy	xy	14
bb	xy	xy	xy	9
bb	xy	xy	xy	6
cc	xy	xy	xy	0
cc	xy	xy	xy	2
dd	xy	xy	xy	0
dd	xy	xy	xy	2

col_a	col__b	col_x	col_y	col_z
aa	xy	xy	xy	2
aa	xy	xy	xy	3
bb	xy	xy	xy	14
bb	xy	xy	xy	9
bb	xy	xy	xy	6
cc	xy	xy	xy	0
cc	xy	xy	xy	2
dd	xy	xy	xy	0
dd	xy	xy	xy	2

Filter one dataframe using another through a loop pandas

1 Answers1

col_a	col__b	col_x	col_y	col_z
aa	xy	xy	xy	2
aa	xy	xy	xy	3
bb	xy	xy	xy	14
bb	xy	xy	xy	9
bb	xy	xy	xy	6
cc	xy	xy	xy	0
cc	xy	xy	xy	2
dd	xy	xy	xy	0
dd	xy	xy	xy	2