I have a dataset that shows the number of meals in each day. Column Week shows which week of the month that date belongs to. Please see below an example of the datase:
Id. date. Meals Week
1 2020-02-23 1 4
1 2020-02-24 1 5
1 2020-02-25 2 5
1 2020-02-27 1 5
1 2020-01-03 2 1
... ... ... ...
2 2020-03-04 3 2
2 2020-03-05 4 2
2 2020-03-06 3 2
2 2020-03-07 1 2
2 2020-03-08 2 2
I am grouping the data based on Participant ID and Week in order to get the average number of meals for each participant in each week. Plz see below:
d = data[['Id','Week','Meals']].groupby(['Id', 'Week'],sort=False ).agg('mean')
Meals
ID Week
1 4 1.400000
5 1.333333
1 2.000000
2 1.250000
3 1.000000
2 2 2.000000
3 2.142857
4 2.500000
5 2.500000
3 2 2.555556
3 2.600000
4 1.833333
5 2.000000
1 2.000000
My first question:
- if max number of meals for each participant was in the first or last week.
After Richie answer, the output of:
print(df.head(50).to_dict('split')
Is
{'index': [('"55eb3fea748092000daa9b25"', 4), ('"55eb3fea748092000daa9b25"', 5), ('"55eb3fea748092000daa9b25"', 1), ('"55eb3fea748092000daa9b25"', 2), ('"55eb3fea748092000daa9b25"', 3)], 'columns': ['Meals'], 'data': [[1.4], [1.3333333333333333], [2.0], [1.25], [1.0]]}
My second question (after updating my post) is:
- which (week of the study) the max week was? Note that the study was performed for 1-4/5 weeks. So the output would look like this, with an extra column called (Week of the study):
Output:
Meals max_week
Id Week Week of the study
1 4 1 1 FALSE
5 2 1 FALSE
1 3 2 TRUE
2 4 1 FALSE
3 5 1 FALSE
2 2 1 2 FALSE
3 2 2 FALSE
4 3 2 TRUE
5 4 2 TRUE
3 2 1 2 FALSE
3 2 2 TRUE
4 3 2 FALSE
5 4 3 FALSE
1 5 3 FALSE
Then I want only to save the ID, Week of the study and Snacks only for the true max weeks as follows:
Meals max_week
ProlificId Week of the study
1 3 2 TRUE
2 3 2 TRUE
2 4 2 TRUE
3 2 2 TRUE
Your help is highly appreciated Shosho