0

I have a dataset that shows the number of meals in each day. Column Week shows which week of the month that date belongs to. Please see below an example of the datase:

Id.        date.               Meals   Week
 1        2020-02-23            1         4
 1        2020-02-24            1         5
 1        2020-02-25            2         5
 1        2020-02-27            1         5
 1        2020-01-03            2         1
...         ...                ...       ...
 2        2020-03-04            3         2
 2        2020-03-05            4         2
 2        2020-03-06            3         2
 2        2020-03-07            1         2
 2        2020-03-08            2         2

I am grouping the data based on Participant ID and Week in order to get the average number of meals for each participant in each week. Plz see below:

d = data[['Id','Week','Meals']].groupby(['Id', 'Week'],sort=False ).agg('mean')

                                 Meals
               ID          Week
                1           4   1.400000
                            5   1.333333
                            1   2.000000
                            2   1.250000
                            3   1.000000
                2           2   2.000000
                            3   2.142857
                            4   2.500000
                            5   2.500000
                3           2   2.555556
                            3   2.600000
                            4   1.833333
                            5   2.000000
                            1   2.000000

My first question:

  1. if max number of meals for each participant was in the first or last week.

After Richie answer, the output of:

print(df.head(50).to_dict('split')

Is

{'index': [('"55eb3fea748092000daa9b25"', 4), ('"55eb3fea748092000daa9b25"', 5), ('"55eb3fea748092000daa9b25"', 1), ('"55eb3fea748092000daa9b25"', 2), ('"55eb3fea748092000daa9b25"', 3)], 'columns': ['Meals'], 'data': [[1.4], [1.3333333333333333], [2.0], [1.25], [1.0]]}

My second question (after updating my post) is:

  1. which (week of the study) the max week was? Note that the study was performed for 1-4/5 weeks. So the output would look like this, with an extra column called (Week of the study):

Output:

                                Meals      max_week
Id  Week    Week of the study       
1    4            1               1          FALSE
     5            2               1          FALSE
     1            3               2          TRUE
     2            4               1          FALSE
     3            5               1          FALSE
2    2            1               2          FALSE
     3            2               2          FALSE
     4            3               2          TRUE
     5            4               2          TRUE
3    2            1               2          FALSE
     3            2               2          TRUE
     4            3               2          FALSE
     5            4               3          FALSE
     1            5               3          FALSE

Then I want only to save the ID, Week of the study and Snacks only for the true max weeks as follows:

                                 Meals    max_week
ProlificId  Week of the study       
    1              3                2       TRUE
    2              3                2       TRUE
    2              4                2       TRUE
    3              2                2       TRUE

Your help is highly appreciated Shosho

Shosho
  • 69
  • 6
  • You do not explain what structure you want for the expected output, see if the answer helps or leave a comment if you need assistance reshaping or modifying – RichieV Sep 09 '20 at 16:30
  • I read your update posted as an answer, please edit your question to include it here, and while you are at it, it would be helpful to get a sample of your dataframe with `df.head(50).to_dict('split')` – RichieV Sep 09 '20 at 19:11
  • Thank you Richie, I have updated my original question. The update is in point 2. – Shosho Sep 09 '20 at 19:40
  • please include the output of `print(df.head(50).to_dict('split')` or a toy dataframe containing `Week of the study` – RichieV Sep 09 '20 at 20:00
  • Its been included in the buddy now :) – Shosho Sep 09 '20 at 23:42

1 Answers1

0

So it seems you only need to find the Week of the study with the max average meals per Id.

Take the following sample

import pandas as pd
import numpy as np

# sample data
# please always provide a callable line of code with your data
# you can get it with df.head(10).to_dict('split')
# read more about this in https://stackoverflow.com/q/63163251/6692898
# and https://stackoverflow.com/q/20109391/6692898
np.random.seed(123) # include when creating random sample
days, people = 18, 2
data = pd.DataFrame({
    'Id': [i for _ in range(days) for i in range(1, people + 1)],
    'Date': pd.date_range('2020-02-23', periods=days).repeat(people).values,
    'Meals': np.random.randint(1, 5, days * people),
})
# data['Week_of_month'] = (data['Date'].dt.day - 1) // 7 + 1
data['Week_of_the_study'] = data['Date'].dt.isocalendar().week
data['Week_of_the_study'] -= data['Week_of_the_study'].min() - 1
print(data)

    Id       Date  Meals  Week_of_the_study
0    1 2020-02-23      3                  1
1    2 2020-02-23      2                  1
2    1 2020-02-24      3                  2
3    2 2020-02-24      3                  2
4    1 2020-02-25      1                  2
5    2 2020-02-25      3                  2
6    1 2020-02-26      3                  2
7    2 2020-02-26      2                  2
8    1 2020-02-27      4                  2
9    2 2020-02-27      3                  2
10   1 2020-02-28      4                  2
11   2 2020-02-28      2                  2
12   1 2020-02-29      3                  2
13   2 2020-02-29      2                  2
14   1 2020-03-01      1                  2
15   2 2020-03-01      2                  2
16   1 2020-03-02      3                  3
17   2 2020-03-02      4                  3
18   1 2020-03-03      2                  3
19   2 2020-03-03      1                  3
20   1 2020-03-04      3                  3
21   2 2020-03-04      1                  3
22   1 2020-03-05      4                  3
23   2 2020-03-05      2                  3
24   1 2020-03-06      4                  3
25   2 2020-03-06      3                  3
26   1 2020-03-07      2                  3
27   2 2020-03-07      1                  3
28   1 2020-03-08      1                  3
29   2 2020-03-08      1                  3
30   1 2020-03-09      1                  4
31   2 2020-03-09      2                  4
32   1 2020-03-10      4                  4
33   2 2020-03-10      4                  4
34   1 2020-03-11      3                  4
35   2 2020-03-11      1                  4

And the code

max_weeks = ( # get average meals per week & Id
    data.groupby(['Id', 'Week_of_the_study'])
    ['Meals'].mean()
).rename('max_meals')

max_weeks = max_weeks.loc[ # filter only weeks with highest avg meals
    max_weeks == max_weeks.groupby('Id').transform(max)
].to_frame()

Output

                      max_meals
Id Week_of_the_study
1  1                   3.000000
2  2                   2.428571

Old answer

You can use groupby.transform after your current output to get the max week.

d = data.groupby(['ProlificId', 'Week'])['Snacks'].mean().to_frame()

# most use cases want to separate by year/month as well, in that case use
# data['date'] = pd.to_datetime(data['date'])
# data['Year'] = data['date'].dt.year
# data['Month'] = data['date'].dt.month
# d = data.groupby(['ProlificId', 'Year', 'Month', 'Week'])['Snacks'].mean().to_frame()

d['max_week'] = d == d.groupby('ProlificId').transform(max)

Output

                   Snacks  max_week
ProlificId Week
1          1     2.000000      True
           4     1.000000     False
           5     1.333333     False
2          2     2.600000      True
RichieV
  • 5,103
  • 2
  • 11
  • 24
  • Thank you Richie, when I run this line: `data['Week_of_the_study'] = data['Date'].dt.isocalendar().week` I got this error: `AttributeError: 'DatetimeProperties' object has no attribute 'isocalendar'` Do you have any idea? – Shosho Sep 10 '20 at 14:34