I have a dataframe of the following form
import pandas as pd
Out[1]:
df = pd.DataFrame({'id':[1,2,3,4,5],
'group':['A','A','A','B','B'],
'start':['2012-08-19','2012-08-22','2013-08-19','2012-08-19','2013-08-19'],
'end':['2012-08-28','2013-09-13','2013-08-19','2012-12-19','2014-08-19']})
id group start end
0 1 A 2012-08-19 2012-08-28
1 2 A 2012-08-22 2013-09-13
2 3 A 2013-08-19 2013-08-21
3 4 B 2012-08-19 2012-12-19
4 5 B 2013-08-19 2014-08-19
For given row in my dataframe I'd like to count the number of items in the same group that have an overlapping time interval.
For example in group A id 2 ranges from 22 August 2012 to 13 Sept 2013 and hence the overlap between id 1 (19 August 2012 to 28 August 2012) and also id 3 (19 August 2013 to 21 August 2013) for a count of 2.
Conversely there is no overlap between the items in group B
So for my example dataframe above i'd like to produce something like
Out[2]:
id group start end count
0 1 A 2012-08-19 2012-08-28 1
1 2 A 2012-08-22 2013-09-13 2
2 3 A 2013-08-19 2013-08-21 1
3 4 B 2012-08-19 2012-12-19 0
4 5 B 2013-08-19 2014-08-19 0
I could "brute-force" this but I'd like to know if there is a more efficient Pandas way of getting this done.
Thanks in advance for your help