1

I have an array of objects in Javascript, which contain an event start and end moment as milliseconds. Our codebase currently uses a naive search algorithm that iterates through the array until we find an event that contains a moment in time:

// time contains the moment of time we're looking to find an event for

profile.tempbasaltreatments.forEach( function eachTreatment (t) {
    if (time <= t.endmills && time >= t.mills) {
      return t;
    }
});

which doesn't cut it with the performance with larger datasets. What'd be a good algorithm / data model to efficiently go through the object array to find an event that encapsulates a moment in time? You can assume that in case the events overlap, first match is always sufficient.

Sulka Haro
  • 13
  • 4
  • 1
    Maybe better for code review? – Tuvia Apr 18 '16 at 20:12
  • This inherently has nothing to do with JavaScript. You're looking for an optimal algoirthm for your usecase, which could be implemented in any language, JS is just your tool. What you want to read about are search algorithms. https://en.wikipedia.org/wiki/Selection_algorithm – Sebastien Daniel Apr 18 '16 at 20:15
  • 1
    `forEach` always iterates over all items - a simple and probably huge improvement would be to move to `find` which will stop executing once a match is identified by returning `true`: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/find – pherris Apr 18 '16 at 20:15
  • in case, you didn't get Tuvia's reference, it's here: http://codereview.stackexchange.com – Jeff Puckett Apr 18 '16 at 20:16
  • 2
    Returning a value in a `forEach` loop makes no sense. You'll not be any wiser after the `forEach`. – trincot Apr 18 '16 at 20:24

2 Answers2

2

I would suggest these pre-processing steps (before searching):

  1. Optionally take a copy of the original array, if needed for other purposes;
  2. Sort the array by event start times. If possible, this should really be done by the database, which can maintain an index for that.
  3. Remove events from the array that have an end time that comes before the previous event's end time. When a time would match with this removed event, it can also match with the previous event. Since matching any event is good enough, we can remove this event.

Then the search would be binary, as follows:

  1. Set range of search to the whole array. A range is expressed as a start and end index in the array (not as two times)
  2. Take the middle element of that range
  3. If this event matches the given time, exit with success
  4. If this event has a start time greater than the given time, repeat from step 2 with the half of the range that comes after the selected element
  5. Otherwise take the other half of the range (before the selected element) and repeat from 2
  6. Stop repeating if the range has no more events and exit with failure.

The pre-processing should be done only once, which has time complexity O(n log n) if you still need to sort, otherwise it is O(n). Once that is done you can repeatedly find events in O(log n) time.

Here is some JavaScript code for the above:

// Create a copy of the original array and sort it by event start date
var events = profile.tempbasaltreatments.slice(0).sort(function (a, b) { 
    return a.mills - b.mills;
});

// Remove events that fall completely within the limits of another event.
// They are not needed to find an event for a given time.
for (var i = 0; i < events.length-1;) {
    if (i && events[i].endmills < events[i-1].endmills) {
         events.splice(i, 1);
    } else {
         i++;
    };
}
// Now also the remaining events' end dates are sorted

// function for fast search in events:    
function findEvent(events, time) {
    // Binary search for event
    var first = 0, last = events.length - 1;
    while (first <= last) { 
        var i = first + Math.floor((last - first) / 2);
        var t = events[i];
        if (time >= t.mills && time <= t.endmills) return t;
        if (time < t.mills) {
            last = i - 1;
        } else { // time > t.endmills
            first = i + 1;
        }
    }
    // returns undefined
}

// Example call: find a currently running event:
var foundEvent = findEvent(events, new Date().getTime());

Addendum

Here is how the filtering happens in the last pre-processing step. First a timeline of how events are ordered after sorting on start time:

a: ---------------
b:     -------
c:      ------------------
d:         --
e:            --  
f:                -----

The events that can be eliminated are b:

a: ---------------
c:      ------------------
d:         --
e:            --  
f:                -----

....then d:

a: ---------------
c:      ------------------
e:            --  
f:                -----

...then e:

a: ---------------
c:      ------------------
f:                -----

... and f:

a: ---------------
c:      ------------------

It is clear that the total covered period is the same as in the original, before the filtering.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • 1
    I think you are assuming that there are no partially overlapping events (e.g. one event from 10 to 15 and another from 12 to 20), which you can't omit – Stefan Haustein Apr 18 '16 at 21:09
  • @StefanHaustein, I am aware of that. Do you see any problem with it? – trincot Apr 18 '16 at 21:22
  • Sorry, stupid mistake on my part, I'll edit that comment. Separately, though, I think it would make things clearer to mention specifically what happens when events overlap, and why it doesn't cause any problems. – j_random_hacker Apr 18 '16 at 21:30
  • @StefanHaustein: [replacing too-late-to-edit comment] In practice this isn't a problem, since we can imagine that each event ends *either at `endmills` or when the next event starts, whichever is sooner*. Doing this can never cause a "false negative", and that's essentially what the code does. – j_random_hacker Apr 18 '16 at 21:31
  • @trincot Can't omit -> can't ignore. Your binary search effectively searches for the closest start time, but there could be an event with a "far away" smaller start time and a huge range that you are skipping in the search. I don't understand why people add incorrect answers without reading previous ones.... :( – Stefan Haustein Apr 18 '16 at 21:35
  • @StefanHaustein: The event with the "far away" smaller start time and huge range will still exist, and will be found. The only way it will not be found is if some other range starts after it starts, but before the query point, and ends after it ends -- in which case *that* range will be found instead. – j_random_hacker Apr 18 '16 at 21:41
  • Concerning the filtering action and the claim that the end dates are also sorted after that operation, please note an update I made to that part of the code, and an added illustration. – trincot Apr 18 '16 at 21:42
  • @StefanHaustein: however you turn it, this is a binary search and will complete after a maximum of *²log n* comparisons (*n* = number of events after filter). True, that can be the worst case, but not more. If you disagree, please provide a counter example. – trincot Apr 18 '16 at 21:48
  • @trincot Ok, sorry, it seems to work because of the linear preprocessing step -- it prevents you from getting trapped between d) and e) (which was my concern), as they are filtered out. – Stefan Haustein Apr 18 '16 at 22:30
  • Thanks! I had to change this to `if (time < t.mills) { last = i - 1; } else { first = i + 1; }`. Guess we're sorting the other way round? – Sulka Haro Apr 20 '16 at 15:44
  • Right! I fixed that error in my answer now. Thanks. – trincot Apr 20 '16 at 17:31
0

If the events are sorted by start time, "mid time" or end time, you could use a binary search to find one close by, then do a local linear search to find one that includes the time stamp (the local search direction depends on the sort order: If the events are ordered by start time, you need to look in the direction of decreasing start time, starting at the closest start time).

The main problem with this approach is that it's slow when there is no maximum event duration, as this means that there is no stop criterion for the local search other than reaching the end of the list.

A better approach might be to store the events in a data structure suitable for efficient access, e.g. an interval tree.

Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51