How to return rows for a specific condition that involves two columns in Pandas?

Question

I have this dataframe:

user_id     status_id       date_created
1           1               2018-02-14 11:49:07.429000-02:00
1           4               2018-02-19 12:51:43.622000-03:00
1           3               2018-02-15 09:21:23.116000-02:00
2           3               2018-02-19 12:52:08.646000-03:00
3           3               2016-08-29 11:02:39.449000-03:00
4           4               2016-08-29 11:18:31.742000-03:00
4           2               2018-02-21 10:43:45.747000-03:00
5           3               2018-02-15 09:34:57.478000-02:00
5           2               2018-02-19 11:52:16.629000-03:00

I want to return only users that has a specific status_id and only this specific status, so for example, for status_id=3, it should return this:

user_id     status_id       date_created
2           3               2018-02-19 12:52:08.646000-03:00
3           3               2016-08-29 11:02:39.449000-03:00

I tried filtering all users that have the status_id that I need, but it also returns users with more than one status_id:

> df.loc[df.user_id.isin(df.user_id.loc[df.status_id == 3])]
user_id     status_id       date_created
1           1               2018-02-14 11:49:07.429000-02:00
1           4               2018-02-19 12:51:43.622000-03:00
1           3               2018-02-15 09:21:23.116000-02:00
2           3               2018-02-19 12:52:08.646000-03:00
3           3               2016-08-29 11:02:39.449000-03:00
5           3               2018-02-15 09:34:57.478000-02:00
5           2               2018-02-19 11:52:16.629000-03:00

BENY · Accepted Answer · 2018-03-29T18:18:35.880

3

By using transform +nunique

df[df.groupby('user_id').status_id.transform('nunique').eq(1)].loc[lambda x :x['status_id']==3,:]

More Info

df.groupby('user_id').status_id.transform('nunique') # get the number of unique value within each group, after this we just need to select the group only contain one value , which is index 3,4
Out[426]: 
0    3
1    3
2    3
3    1
4    1
5    2
6    2
7    2
8    2
Name: status_id, dtype: int64

edited Mar 29 '18 at 18:18

answered Mar 29 '18 at 17:39

BENY

317,841
20
164
234

@AlexandreLara adding the explanation – BENY Mar 29 '18 at 18:14
Oh, now I get it. So after that I can just filter by `status_id=3` like this: `df[df.groupby('user_id').status_id.transform('nunique').eq(1)][df.status_id == 3]`? – Alexandre Lara Mar 29 '18 at 18:18
@AlexandreLara `df[df.groupby('user_id').status_id.transform('nunique').eq(1)].loc[lambda x :x['status_id']==3,:] ` – BENY Mar 29 '18 at 18:19

vkainth · Answer 2 · 2018-03-29T19:30:30.630

0

You can use df.loc[df['status_id'] == 3] as described here

Python File with relevant input

Example

edited Mar 29 '18 at 19:30

answered Mar 29 '18 at 17:44

vkainth

13
4

1

So you just change the op's original df ? To match what his out put ? – BENY Mar 29 '18 at 18:06
No. I created the dataframe he wrote in his original description. Then, I applied the rule I stated. Please do so yourself before you comment negatively. – vkainth Mar 29 '18 at 19:12
Ok, then why there is not `5 3 2018-02-15 09:34:57.478000-02:00` show in your result – BENY Mar 29 '18 at 19:19
Because I didn't enter ALL the values. That would be pointless and would take extra time. I can do that, if you want me to and repost. – vkainth Mar 29 '18 at 19:21
if only using .loc like this , the out put will not match what op's expected output – BENY Mar 29 '18 at 19:24
Please check the edited answer. I have uploaded the gist and an image showing the original dataframe and the output. – vkainth Mar 29 '18 at 19:31
Can you recheck what op want ? – BENY Mar 29 '18 at 19:33
Op wanted to return all users with a specific status_id right? The code I made does this. – vkainth Mar 29 '18 at 19:41
Ok this will be last comment ... *return only users that has a specific status_id and only this specific status* – BENY Mar 29 '18 at 19:44
Oh. I get it. Thanks @Wen for clarifying. The OP wants users who only have a particular status id and nothing else. – vkainth Mar 29 '18 at 19:44

How to return rows for a specific condition that involves two columns in Pandas?

2 Answers2