How do I create a new pandas dataframe after comparing 2 data frame columns based on a condition?

Question

Data frame 1

cust_id1   cust_id2  
23           50     
23           51     
23           52      
24           32     
25           40  
25           41

Dataframe 2

cust_id    
50
51
52
32
40

FOR EACH cust_id1, check if cust_id2 of Dataframe 1 is present in cust_id of Dataframe 2. IF YES, THEN create a new data frame. Also, I further would like to add a new column count for cust_id2

Expected Output in a new data frame 3

cust_id1   cust_id2  count_cust_id2
23           50           1
23           51           1
23           52           1
24           32           1
25           40           1

This is what I have tried

uniq = df1['cust_id1'].dropna().unique()

cust_id_grouped = df1.groupby(['cust_id1'], as_index = False)

for i in uniq:

    all_groups = cust_id_grouped.get_group(i)

    count_cust_id2 = 0
    for index, row in all_groups.iterrows():            
        if (row.cust_id1.isin(df2. cust_id)):
            count_cust_id2 = count_cust_id2 + 1

Part 2 In addition to the above data frames, I have similar data frames.

Dataframe 4

cust_id1   cust_id2  
23           50     
23           51     
23           52      
24           32     
25           40  
25           41
26           50

Dataframe 5

cust_id    
50
51
52
32
40
41
50

The Expected Output should update data frame 3,

cust_id1   cust_id2  count_cust_id2
23           50           2
23           51           2
23           52           2
24           32           2
25           40           2
25           41           1
26           50           1

I have multiple of these data frames and each of these groups(df1-df2) and (df4-df5) represents a single month. The count is actually for the month.

jezrael · Accepted Answer · 2020-08-05T06:29:40.243

1

Use Series.isin for test all values, testing per groups is not necessary. Last get counts by GroupBy.transform and GroupBy.size:

df3 = df1[df1.cust_id2.isin(df2.cust_id)].copy()
df3['count_cust_id2'] = df3.groupby('cust_id2')['cust_id2'].transform('size')
print (df3)
   cust_id1  cust_id2  count_cust_id2
0        23        50               1
1        23        51               1
2        23        52               1
3        24        32               1
4        25        40               1

edited Aug 05 '20 at 06:29

answered Aug 05 '20 at 06:23

jezrael

822,522
95
1,334
1,252

your answers worked. But realized I have multiple data frames like these and it needs to append to df3. Please look at the updated question above. Thank you. – Amogh Katwe Aug 05 '20 at 18:45
@AmoghKatwe - You can processing same way `df4, df5` like solution above to `df6` and then `dffin = pd.concat([df3, df6]).groupby(['cust_id1','cust_id2'], sort=False, as_index=False).sum()` – jezrael Aug 06 '20 at 05:54
Can I do this without having to always concat for every 2 data frames. I'd like to do it in a way that irrespective to the number of data frames, I can have one concat? – Amogh Katwe Aug 06 '20 at 06:13
@AmoghKatwe - sure, you can pass list of DataFrames to concat like `L = [df3, df6, df9, dfN` to `dffin = pd.concat(L)` – jezrael Aug 06 '20 at 06:15
1

It worked. Thank you so much. This is a great help! – Amogh Katwe Aug 06 '20 at 06:21

How do I create a new pandas dataframe after comparing 2 data frame columns based on a condition?

1 Answers1