2

I want groups based on three columns, but keep the original columns in the output(6 column).

this link Actually did not help me. it just had three columns and grouped based on those three columns.

this is a sample of my original data frame :

   Clinic Number  Question Text  Answer Text Answer Date     year  month dayofyear
1       1        bathing            No    2006/7/1     2006    1         7
2       1        dressing           No    2006/7/1     2006    1         7
3       1        feeding            NO    2006/7/1     2006    1         7
4       1        housekeeping       No    2006/7/1     2006    1         7
5       1        medications        No    2006/7/1     2006    1         7
6       2        bathing            No    2006/1/1     2006    1         1
7       2        dressing           Yes   2006/1/1     2006    1         1
8       2        feeding            Yes   2006/1/1     2006    1         1
9       2        housekeeping       Yes   2006/1/1     2006    1         1
10      2        medications        No    2006/1/1     2006    1         1

I want to group by [clinicNumber,Answer Text, Year,month], but I need other columns like Answer Date and question Text, yearOfday column, because later I want to do some calculation on them.

What I did: this is the group by I am using on this dataframe to reach my goal. the problem is that there is no Answer date ,yearofyear ... in the output.

grouped = data.groupby(['Clinic Number','year','month','Answer Text']).size().reset_index(name='counts')

the output of this group by is like this:

 Clinic Number    year   month   Answer Text  counts
0         1       1999     5          No       6
1         1       2000    10          No       6
2         1       2000     2          No       6
3         1       2001     9          Yes       6
4         1       2002     2          Yes       8
5         1       2003     2          No       8
6         1       2004     4          No       8
7         1       2014     6          No       2
8         1       2015    10          No       2
5         2       2003     2          No       8
6         2       2004     4          No       8
7         2       2014     6          No       2
8         2       2015    10          No       2

Any help is appreciated :)

sariii
  • 2,020
  • 6
  • 29
  • 57

1 Answers1

3

IIUC, use groupby with transform and set it to a new column in existing dataframe.

data['counts'] = (data.groupby(['Clinic Number',
                                'year',
                                'month',
                                'Answer Text'])['Clinic Number']
                      .transform('size'))

data = data.sort_values(['Client Number','Answer Date'])

Output:

    Clinic Number Question Text Answer Text Answer Date  year  month  dayofyear  counts
1               1       bathing          No    2006/7/1  2006      1          7       4
2               1      dressing          No    2006/7/1  2006      1          7       4
3               1       feeding          NO    2006/7/1  2006      1          7       1
4               1  housekeeping          No    2006/7/1  2006      1          7       4
5               1   medications          No    2006/7/1  2006      1          7       4
6               2       bathing          No    2006/1/1  2006      1          1       2
7               2      dressing         Yes    2006/1/1  2006      1          1       3
8               2       feeding         Yes    2006/1/1  2006      1          1       3
9               2  housekeeping         Yes    2006/1/1  2006      1          1       3
10              2   medications          No    2006/1/1  2006      1          1       2
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Smart. why I did not think about this approach :||| – sariii Jun 12 '18 at 20:00
  • @sariaGoudarzi Thank you. Happy coding! – Scott Boston Jun 12 '18 at 20:01
  • Just one thing @Scott Boston, Now I can not apply sorting on this. Is there any difference between this code for the sorting and previous one?. I want to do sorting on date. thanks for your time :) – sariii Jun 12 '18 at 20:13
  • Nope, sorting should be exactly the same. this is the original dataframe with one addtional column. – Scott Boston Jun 12 '18 at 20:14
  • May I ak you to update the answer with sorting please – sariii Jun 12 '18 at 20:15
  • Actually I need the sort for each Clinic Number, I mean for each clinic Number, its date should be sort. with this approach it totally look at the data frame and sort it – sariii Jun 12 '18 at 20:24
  • FYI... you don't have Answer Text in your original dataframe, so I can't test and verify this code. – Scott Boston Jun 12 '18 at 20:26
  • You are right @Scott Boston. I just added that column :) – sariii Jun 12 '18 at 20:30
  • may I ask you please have alook on this question . Do you have any idea of that? thanks for your time :) https://stackoverflow.com/questions/50862198/date-range-for-six-monthly-in-pandas – sariii Jun 14 '18 at 17:57