I am trying to iterate over two loops, the first segment_arr
is a list of small (e.g. 0.1m) segments of a line (sorted by distance along the line), and the second fail_sections
is a list of larger sections of that line (e.g. 5m) which "fail" in some way (also sorted by distance along the line). Ultimately I am trying to merge float ranges - there are some answers here, but all based on integers and with some caveats about overlapping.
I have a very naive version, which seems to be performant enough for my purposes, but it got me thinking, how could this be made more efficient:
new_seg_array = []
for seg in segment_arr:
segfail = False
for fail_sec in fail_sections:
if seg.start_dist >= fail_sec.start_dist and seg.end_dist <= fail_sec.end_dist:
segfail = True
seg_data = Segment(start_dist=seg.start_dist,end_dist=seg.end_dist, does_fail=segfail)
new_seg_array.append(seg_data)
The main problem is that the second loop is wasting iterations there is no chance that the condition can be true as the second range is well outside where we are in the first range. I considered using a generator expression e.g.
filtered_fail_sections = (x for x in fail_sections where x.start_distance > seg.start_distance and x.end_distance < seg.end_distance)
for fail_sec in filtered_fail_sections:
to filter the relevant fail segments, but it struck me that this was just doing the filtering work in the generator. Is there any way in Python of gradually reducing the scope of the second loop, so elements that are no longer relevant because they are above the range of the first loop are no longer iterated over? So over time the second loop would get smaller, until it was nothing, which would presumably help performance on much larger datasets. Or any other major efficiency improvements possible?