0

Would you, please help me, to group pandas dataframe by multiple conditions.

Here is how I do it in SQL:

with a as (
  select high 
,sum( case when qr = 1 and now = 1 then 1 else 0 end ) q1_bad
,sum( case when qr = 2 and now = 1 then 1 else 0 end ) q2_bad
  from #tmp2
  group by high
)
select a.high from a
where q1_bad >= 2 and q2_bad >= 2 and a.high is not null

Here is the part of the dataset:

import pandas as pd
a = pd.DataFrame()

a['client'] = range(35)
a['high'] = ['02','47','47','47','79','01','43','56','46','47','17','58','42','90','47','86','41','56',
'55','49','47','49','95','23','46','47','80','80','41','49','46','49','56','46','31']
a['qr'] = ['1','1','1','1','2','1','1','2','2','1','1','2','2',
'2','1','1','1','2','1','2','1','2','2','1','1','1','2','2','1','1',
'1','1','1','1','2']
a['now'] = ['0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0',
'0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0']

Thank you very much!

Jarvis
  • 8,494
  • 3
  • 27
  • 58
Alex Ivanov
  • 657
  • 1
  • 8
  • 17

1 Answers1

2

it's very similar, you need to define your columns ahead of the groupby then apply your operation.

assuming you have actual integers and not strings.

import numpy as np
import pandas as pd
a.assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
         q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)

).groupby('high')[['q1_bad','q2_bad']].sum()

     q1_bad  q2_bad
high                
01         0       0
02         0       0
17         0       0
23         0       0
31         0       0
41         0       0
42         0       1
43         0       0
46         0       0
47         0       0
49         1       0
55         0       0
56         0       0
58         0       0
79         0       0
80         0       0
86         0       0
90         0       0
95         0       0

for you extra where clause you can filter it one of many ways, but for ease we can add query at the end.

a.dropna(subset='high').assign(q1_bad = np.where((a['qr'].eq(1) & a['now'].eq(1)),1,0),
         q2_bad = np.where((a['qr'].eq(2) & a['now'].eq(1)),1,0)

).groupby('high')[['q1_bad','q2_bad']].sum().query('q2_bad >= 2 and q1_bad >= 2')
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • 1
    Manakin, thanks a lot. Your code works perfectly. The only thing I corrected is q2_bad on q1_bad in the final query. – Alex Ivanov Dec 30 '20 at 13:30