1

I have a data frame and I need to group by 2 columns and create a new column based on condition. My data looks like this:

ID week day_num
1 1 2
1 1 3
1 2 4
1 2 1
2 1 1
2 2 2
3 1 4

I need to group by the columns ID & week so there's a row for each ID for each week. The groupby is based on condition- if for a certain week an ID has the value 1 in column day_num, the value will be 1 under groupby, otherwise 0. For example, ID 1 has 2 & 3 under both rows so it equals 0 under groupby, for week 2 ID 1 it has a row with value 1, so 1.

The output I need looks like this:

ID week day1
1 1 0
1 2 1
2 1 1
2 2 0
3 1 0

I searched and found this code, but it uses count, where I just need to write the value 1 or 0.

df1=df1.groupby('ID','week')['day_num'].apply(lambda x: (x=='1').count())

Is there a way to do this?

Thanks!

kri
  • 95
  • 6
  • `df.groupby(['ID', 'week'])['day_num'].transform(lambda g: (g==1).any()*1)` This should solve your question – PTQuoc Jan 24 '23 at 21:46

2 Answers2

3

You can approach from the other way: check equality against 1 in "day_num" and group that by ID & week. Then aggregate with any to see if there was any 1 in the groups. Lastly convert True/Falses to 1/0 and move groupers to columns.

df["day_num"].eq(1).groupby([df["ID"], df["week"]]).any().astype(int).reset_index()

   ID  week  day_num
0   1     1        0
1   1     2        1
2   2     1        1
3   2     2        0
4   3     1        0
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
  • 3
    Or preprocess `day_num` so that `1` gets mapped to `1` and everything else gets mapped to `0`, then do the group-by and aggregate based on `max`. – cadolphs Jan 24 '23 at 21:47
  • 1
    Frankly, I prefer your solution...for the quick and beautiful way and I like the mastering of the objects properties. – Laurent B. Jan 25 '23 at 07:06
1
import pandas as pd

src = pd.DataFrame({'ID': [1, 1, 1, 1, 2, 2, 3],    
                    'week': [1, 1, 2, 2, 1, 2, 1],    
                    'day_num': [2, 3, 4, 1, 1, 2, 4],
                    })


src['day_num'] = (~(src['day_num']-1).astype(bool)).astype(int)
r = src.sort_values(by=['day_num']).drop_duplicates(['ID', 'week'], keep='last').sort_index().reset_index(drop=True)
print(r)

Result

   ID  week  day_num
0   1     1        0
1   1     2        1
2   2     1        1
3   2     2        0
4   3     1        0
Laurent B.
  • 1,653
  • 1
  • 7
  • 16