I want groups based on three columns, but keep the original columns in the output(6 column).
this link Actually did not help me. it just had three columns and grouped based on those three columns.
this is a sample of my original
data frame :
Clinic Number Question Text Answer Text Answer Date year month dayofyear
1 1 bathing No 2006/7/1 2006 1 7
2 1 dressing No 2006/7/1 2006 1 7
3 1 feeding NO 2006/7/1 2006 1 7
4 1 housekeeping No 2006/7/1 2006 1 7
5 1 medications No 2006/7/1 2006 1 7
6 2 bathing No 2006/1/1 2006 1 1
7 2 dressing Yes 2006/1/1 2006 1 1
8 2 feeding Yes 2006/1/1 2006 1 1
9 2 housekeeping Yes 2006/1/1 2006 1 1
10 2 medications No 2006/1/1 2006 1 1
I want to group by [clinicNumber,Answer Text, Year,month]
,
but I need other columns like Answer Date and question Text, yearOfday
column, because later I want to do some calculation on them.
What I did:
this is the group by I am using on this dataframe to reach my goal. the problem is that there is no Answer date ,yearofyear ...
in the output.
grouped = data.groupby(['Clinic Number','year','month','Answer Text']).size().reset_index(name='counts')
the output of this group by is like this:
Clinic Number year month Answer Text counts
0 1 1999 5 No 6
1 1 2000 10 No 6
2 1 2000 2 No 6
3 1 2001 9 Yes 6
4 1 2002 2 Yes 8
5 1 2003 2 No 8
6 1 2004 4 No 8
7 1 2014 6 No 2
8 1 2015 10 No 2
5 2 2003 2 No 8
6 2 2004 4 No 8
7 2 2014 6 No 2
8 2 2015 10 No 2
Any help is appreciated :)