I have a data set that looks like this:
Date | ID | Task | Description
2016-01-06 00:00:00 | 1 | 010 | This is text
2016-01-06 00:10:00 | 1 | 020 | This is text
2016-01-06 00:20:00 | 1 | 010 | This is text
2016-01-06 01:00:00 | 1 | 020 | This is text
2016-01-06 01:10:00 | 1 | 030 | This is text
2016-02-06 00:00:00 | 2 | 010 | This is text
2016-02-06 00:10:00 | 2 | 020 | This is text
2016-02-06 00:20:00 | 2 | 010 | This is text
2016-02-06 01:00:00 | 2 | 020 | This is text
2016-02-06 01:01:00 | 2 | 030 | This is text
Task 020
usually occurs after task 010
which means that when Task 020
starts means that task 010
ends, same applies for Task 020
, if it comes before any other Task it means that it has stopped.
I need to group by Task
calculating the average duration, total sum and count of each type of task in each ID
, so I am looking for something like this:
ID | Task | Average | Sum | Count
1 | 010 | 25 | 50 | 2
1 | 020 | 10 | 20 | 2
etc | etc | etc | etc | etc
There are more IDs but I only care about 010
and 020
, so whatever number is returned from them is acceptable.
Can someone help me on how to do this in Python?