I have the following dataframe in Python:
ID | country_ID | visit_time |
---|---|---|
0 | ESP | 10 days 12:03:00 |
0 | ENG | 5 days 10:02:00 |
1 | ENG | 3 days 08:05:03 |
1 | ESP | 1 days 03:02:00 |
1 | ENG | 2 days 07:01:03 |
1 | ENG | 3 days 01:00:52 |
2 | ENG | 0 days 12:01:02 |
2 | ENG | 1 days 22:10:03 |
2 | ENG | 0 days 20:00:50 |
For each ID, I want to get:
avg_visit_ESP and avg_visit_ENG columns.
- Average time visit with country_ID = ESP for each ID.
- Average time visit with country_ID = ENG for each ID.
ID | avg_visit_ESP | avg_visit_ENG |
---|---|---|
0 | 10 days 12:03:00 | 5 days 10:02:00 |
1 | 1 days 03:02:00 | (8 days 16:06:58) / 3 |
2 | NaT | (3 days 06:11:55) / 3 |
I don't know how to specify in groupby a double grouping, first by ID and then by country_ID. If you can help me I would appreciate it.
P.S.: The date format of visit_time (timedelta), can perform addition and division without any apparent problem.
from datetime import datetime, timedelta
date1 = pd.to_datetime('2022-02-04 10:10:21', format='%Y-%m-%d %H:%M:%S')
date2 = pd.to_datetime('2022-02-05 20:15:41', format='%Y-%m-%d %H:%M:%S')
date3 = pd.to_datetime('2022-02-07 20:15:41', format='%Y-%m-%d %H:%M:%S')
sum1date = date2-date1
sum2date = date3-date2
sum3date = date3-date1
print((sum1date+sum2date+sum3date)/3)