You could look at answers to this question on python profiling to help you identify bottle necks. There's a bit of sorting going on, which may be sub-optimal.
If your event data is sorted by start time then you should be able to iterate through the events in order and look ahead and behind to determine the count of overlapping events.
Here's an example of what I mean, I generate 150,000 sequentially starting events with a random duration. Then iterate through the events in the manner described above and print the top ten events with the most concurrencies.
>python py_counter.py
(7548: start 1546268748.0, duration: 89.0, 47)
(7547: start 1546268747.0, duration: 8.0, 46)
(7546: start 1546268746.0, duration: 86.0, 45)
(7545: start 1546268745.0, duration: 36.0, 44)
(7544: start 1546268744.0, duration: 21.0, 43)
(59089: start 1546320289.0, duration: 25.0, 43)
(7543: start 1546268743.0, duration: 75.0, 42)
(59088: start 1546320288.0, duration: 96.0, 42)
(104116: start 1546365316.0, duration: 70.0, 42)
(7542: start 1546268742.0, duration: 94.0, 41)
Finished 150,000 events in 5.1 seconds
Here's the code:
from datetime import datetime, timedelta
from dataclasses import dataclass
from collections import Counter
from random import randint
import time
@dataclass(eq=True, frozen=True) # make hashable
class Event:
"""Track event durations"""
tag: int
start: datetime
end: datetime
def __repr__(self):
return f"{self.tag}: start {self.start.timestamp()}, duration: {self.end.timestamp() - self.start.timestamp()}"
start_time = time.time()
# create some sequentially starting events with random durations
events = []
for i in range(150000):
start = datetime(2019, 1, 1) + timedelta(seconds=i)
end = start + timedelta(seconds=randint(1, 100))
events.append(Event(i, start, end))
concurrencies = Counter()
#events is a list sorted by start-time
for i, event in enumerate(events):
# look back
j = i - 1
while j > 0 and event.start < events[j].end:
j -= 1
concurrencies[event] += 1
# look forward
j = i + 1
max_index = len(events) - 1
while j < max_index and event.end < events[i].start:
j += 1
concurrencies[event] += 1
# print events with highest number of concurrencies
for event in concurrencies.most_common(10):
print(event)
print(f"Finished {len(events):,d} events in {time.time() - start_time:.1f} seconds")
Let me know if any portions need clarifying.