Finding max from grouped data in Python

Question

I have a dataset that shows the number of meals in each day. Column Week shows which week of the month that date belongs to. Please see below an example of the datase:

Id.        date.               Meals   Week
 1        2020-02-23            1         4
 1        2020-02-24            1         5
 1        2020-02-25            2         5
 1        2020-02-27            1         5
 1        2020-01-03            2         1
...         ...                ...       ...
 2        2020-03-04            3         2
 2        2020-03-05            4         2
 2        2020-03-06            3         2
 2        2020-03-07            1         2
 2        2020-03-08            2         2

I am grouping the data based on Participant ID and Week in order to get the average number of meals for each participant in each week. Plz see below:

d = data[['Id','Week','Meals']].groupby(['Id', 'Week'],sort=False ).agg('mean')

                                 Meals
               ID          Week
                1           4   1.400000
                            5   1.333333
                            1   2.000000
                            2   1.250000
                            3   1.000000
                2           2   2.000000
                            3   2.142857
                            4   2.500000
                            5   2.500000
                3           2   2.555556
                            3   2.600000
                            4   1.833333
                            5   2.000000
                            1   2.000000

My first question:

if max number of meals for each participant was in the first or last week.

After Richie answer, the output of:

print(df.head(50).to_dict('split')

Is

{'index': [('"55eb3fea748092000daa9b25"', 4), ('"55eb3fea748092000daa9b25"', 5), ('"55eb3fea748092000daa9b25"', 1), ('"55eb3fea748092000daa9b25"', 2), ('"55eb3fea748092000daa9b25"', 3)], 'columns': ['Meals'], 'data': [[1.4], [1.3333333333333333], [2.0], [1.25], [1.0]]}

My second question (after updating my post) is:

which (week of the study) the max week was? Note that the study was performed for 1-4/5 weeks. So the output would look like this, with an extra column called (Week of the study):

Output:

                                Meals      max_week
Id  Week    Week of the study       
1    4            1               1          FALSE
     5            2               1          FALSE
     1            3               2          TRUE
     2            4               1          FALSE
     3            5               1          FALSE
2    2            1               2          FALSE
     3            2               2          FALSE
     4            3               2          TRUE
     5            4               2          TRUE
3    2            1               2          FALSE
     3            2               2          TRUE
     4            3               2          FALSE
     5            4               3          FALSE
     1            5               3          FALSE

Then I want only to save the ID, Week of the study and Snacks only for the true max weeks as follows:

                                 Meals    max_week
ProlificId  Week of the study       
    1              3                2       TRUE
    2              3                2       TRUE
    2              4                2       TRUE
    3              2                2       TRUE

Your help is highly appreciated Shosho

You do not explain what structure you want for the expected output, see if the answer helps or leave a comment if you need assistance reshaping or modifying — RichieV, Sep 09 '20 at 16:30
I read your update posted as an answer, please edit your question to include it here, and while you are at it, it would be helpful to get a sample of your dataframe with `df.head(50).to_dict('split')` — RichieV, Sep 09 '20 at 19:11
Thank you Richie, I have updated my original question. The update is in point 2. — Shosho, Sep 09 '20 at 19:40
please include the output of `print(df.head(50).to_dict('split')` or a toy dataframe containing `Week of the study` — RichieV, Sep 09 '20 at 20:00

RichieV · Accepted Answer · 2020-09-10T00:53:58.500

So it seems you only need to find the Week of the study with the max average meals per Id.

Take the following sample

import pandas as pd
import numpy as np

# sample data
# please always provide a callable line of code with your data
# you can get it with df.head(10).to_dict('split')
# read more about this in https://stackoverflow.com/q/63163251/6692898
# and https://stackoverflow.com/q/20109391/6692898
np.random.seed(123) # include when creating random sample
days, people = 18, 2
data = pd.DataFrame({
    'Id': [i for _ in range(days) for i in range(1, people + 1)],
    'Date': pd.date_range('2020-02-23', periods=days).repeat(people).values,
    'Meals': np.random.randint(1, 5, days * people),
})
# data['Week_of_month'] = (data['Date'].dt.day - 1) // 7 + 1
data['Week_of_the_study'] = data['Date'].dt.isocalendar().week
data['Week_of_the_study'] -= data['Week_of_the_study'].min() - 1
print(data)

    Id       Date  Meals  Week_of_the_study
0    1 2020-02-23      3                  1
1    2 2020-02-23      2                  1
2    1 2020-02-24      3                  2
3    2 2020-02-24      3                  2
4    1 2020-02-25      1                  2
5    2 2020-02-25      3                  2
6    1 2020-02-26      3                  2
7    2 2020-02-26      2                  2
8    1 2020-02-27      4                  2
9    2 2020-02-27      3                  2
10   1 2020-02-28      4                  2
11   2 2020-02-28      2                  2
12   1 2020-02-29      3                  2
13   2 2020-02-29      2                  2
14   1 2020-03-01      1                  2
15   2 2020-03-01      2                  2
16   1 2020-03-02      3                  3
17   2 2020-03-02      4                  3
18   1 2020-03-03      2                  3
19   2 2020-03-03      1                  3
20   1 2020-03-04      3                  3
21   2 2020-03-04      1                  3
22   1 2020-03-05      4                  3
23   2 2020-03-05      2                  3
24   1 2020-03-06      4                  3
25   2 2020-03-06      3                  3
26   1 2020-03-07      2                  3
27   2 2020-03-07      1                  3
28   1 2020-03-08      1                  3
29   2 2020-03-08      1                  3
30   1 2020-03-09      1                  4
31   2 2020-03-09      2                  4
32   1 2020-03-10      4                  4
33   2 2020-03-10      4                  4
34   1 2020-03-11      3                  4
35   2 2020-03-11      1                  4

And the code

max_weeks = ( # get average meals per week & Id
    data.groupby(['Id', 'Week_of_the_study'])
    ['Meals'].mean()
).rename('max_meals')

max_weeks = max_weeks.loc[ # filter only weeks with highest avg meals
    max_weeks == max_weeks.groupby('Id').transform(max)
].to_frame()

Output

                      max_meals
Id Week_of_the_study
1  1                   3.000000
2  2                   2.428571

Old answer

You can use groupby.transform after your current output to get the max week.

d = data.groupby(['ProlificId', 'Week'])['Snacks'].mean().to_frame()

# most use cases want to separate by year/month as well, in that case use
# data['date'] = pd.to_datetime(data['date'])
# data['Year'] = data['date'].dt.year
# data['Month'] = data['date'].dt.month
# d = data.groupby(['ProlificId', 'Year', 'Month', 'Week'])['Snacks'].mean().to_frame()

d['max_week'] = d == d.groupby('ProlificId').transform(max)

Output

                   Snacks  max_week
ProlificId Week
1          1     2.000000      True
           4     1.000000     False
           5     1.333333     False
2          2     2.600000      True

Thank you Richie, when I run this line: `data['Week_of_the_study'] = data['Date'].dt.isocalendar().week` I got this error: `AttributeError: 'DatetimeProperties' object has no attribute 'isocalendar'` Do you have any idea? — Shosho, Sep 10 '20 at 14:34

Finding max from grouped data in Python

1 Answers1