I have two dataframes:
section_headers =
start_sect_ end_sect_
0 0 50
1 121 139
2 221 270
sentences =
start_sent_ end_sent_
0 0 50
1 56 76
2 77 85
3 88 111
4 114 120
5 121 139
6 221 270
I'm trying to merge sentences
that belongs under each section_header
...
A sentence belongs under a section_header when its start_sent_ is greater than or equal to that of a section_header's start_sect_ and less than or equal to the next section_header's start_sect_, etc.
Given this, my desired output is:
merge =
start_sent_ end_sent_ start_sect_
0 0 50 0
1 56 76 0
2 77 85 0
3 88 111 0
4 114 120 0
5 121 139 121
6 221 270 221
I initially converted this to a dictionary and then created a new dataframe based on the conditions, but the amount of data I'm dealing with was very large and it took forever to iterate through the records.
I'm trying to devise a way to not have to iterate through these records to do a merge of the data. I tried the broadcast method here Solution 2: Numpy Solution for large dataset, but since this method doesn't allow indexing of the arrays, it doesn't work. Otherwise, it works great for two other merge use cases I have.