2

So I have this data with ID, date, and if they did activities on that day

ID             DATE              DONE  
1           2009-1-3             Yes
1           2009-1-15            Yes
2           2009-9-3             Yes
3           2009-10-7            No
1           2009-12-1            Yes
3           2009-10-8            Yes
2           2009-10-21           Yes

So I want to groupby and create columns with months and 1 if they did the activity in that month 0 o.w

ID 2009-1 2009-2 ... 2009-9 2009-10 2009-11 2009-12
1     1     0    ...   0         0       0     1
2     0     0    ...   1         1       0     0
3     0     0    ...   0         0       0     0

Thank you

jarge
  • 185
  • 9

3 Answers3

1

Using Series groupby+ all then with unstack

s=df['DONE'].eq('Yes').\
      groupby([df['ID'],df.DATE.dt.strftime('%Y-%m')]).\
          all().astype(int).unstack(fill_value=0)
s
Out[1026]: 
DATE  2009-01  2009-09  2009-10  2009-12
ID                                      
1           1        0        0        1
2           0        1        1        0
3           0        0        0        0
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks so much. On unrelated note, what resources are the best for mastering pandas? I know the basics of it, but when it comes to complex stuff, I am having hard time. – jarge Apr 23 '19 at 03:01
  • 1
    @jarge just my personal experience , you can check the book Pandas cook book https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html or the book https://books.google.com/books/about/Pandas_Cookbook.html?id=yRhKDwAAQBAJ&printsec=frontcover&source=kp_read_button#v=onepage&q&f=false – BENY Apr 23 '19 at 03:05
1

Yet another way

Setup

df.DATE = pd.to_datetime(df.DATE)
df.DONE = df.DONE.map({'Yes':1, 'No':0})

Solution

Then create an auxiliary int column in the style 200901

df['DATE2']=df.DATE.dt.year.astype(str).add(df.DATE.dt.month.astype(str).str.zfill(2)).astype(int)

and pivot_table+reindex

df.pivot_table(columns='DATE2', index='ID', values='DONE').reindex(np.arange(200901, 200913), axis='columns').fillna(0)


DATE2   200901  200902  200903  200904  200905  200906  200907  200908  200909  200910  200911  200912
ID                                              
1       1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0
2       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     1.0     0.0     0.0
3       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.5     0.0     0.0
Community
  • 1
  • 1
rafaelc
  • 57,686
  • 15
  • 58
  • 82
1

I know I'm late but I think that this solution is quite nice, too:

df['DATE'] = pd.to_datetime(df['DATE']).dt.strftime('%Y-%m')
(df.pivot(columns='DATE', values='DONE') == 'Yes') + 0
JoergVanAken
  • 1,286
  • 9
  • 10