I've got a fairly large data set of about 2 million records, each of which has a start time and an end time. I'd like to insert a field into each record that counts how many records there are in the table where:
- Start time is less than or equal to "this row"'s start time
- AND end time is greater than "this row"'s start time
So basically each record ends up with a count of how many events, including itself, are "active" concurrently with it.
I've been trying to teach myself pandas to do this with but I am not even sure where to start looking. I can find lots of examples of summing rows that meet a given condition like "> 2", but can't seem to grasp how to iterate over rows to conditionally sum a column based on values in the current row.