I have a dataframe that has intervals and a label associated with each. I need to group and aggregate rows separated by a given distance from others.
For example, groups rows whose start/end are within 3 units of the the start/end of other rows have their label
fields concatenated:
In [16]: df = pd.DataFrame([
...: [ 1, 3,'a'], [ 4,10,'b'],
...: [15,17,'c'], [18,20,'d'],
...: [27,30,'e'], [31,40,'f'], [41,42,'g'],
...: [50,54,'h']],
...: columns=['start', 'end', 'label'])
...:
In [17]: df
Out[17]:
start end label
0 1 3 a
1 4 10 b
2 15 17 c
3 18 20 d
4 27 30 e
5 31 40 f
6 41 42 g
7 50 54 h
Desired output:
In [18]: df_desired = group_by_interval(df)
In [19]: df_desired
Out[19]:
start end label
0 1 10 a b
1 15 20 c d
2 27 30 e f g
3 50 54 h
How can I execute this sort of grouping by interval with a dataframe?
I have found one similar SO here, but it's a little different since I don't know where to cut a priori.