Array loop performance when looking for duplicate/collisions

Question

This code loops though an array of events, each with a start and end time... and simply sets a 'match' key to say that it has matched another event (i.e. a collision to resolve):

<?php

$results = array(
        array('from' => 1, 'to' => 3),
        array('from' => 3, 'to' => 6),
        array('from' => 7, 'to' => 9),
        array('from' => 4, 'to' => 8),
    );

$k = 0;

foreach ($results as $p_id => $p_result) {

    foreach ($results as $c_id => $c_result) {
        if ($p_id == $c_id) {
            continue;
        }
        $k++;
        if ($p_result['from'] < $c_result['to'] && $p_result['to'] > $c_result['from']) {
            $results[$p_id]['match'] = $c_id;
            break;
        }
    }

}

print_r($results);
echo $k;

?>

By itself this isn't necessarily slow, but might be when thousands of events are being checked... i.e. using the DateTime object or UNIX timestamp for events though-out the year.

Trying with 3000 randomised events on my computer takes about 1 second with the "break" and 3 seconds without.

The to/from collision check itself is inspired by: Comparing date ranges

The main issue I see is that once event 1 has been checked against events 2-4, then events 2-4 don't really need to check themselves against 1:

$results[$p_id]['match'] = $c_id;
$results[$c_id]['match'] = $p_id;

But I can't think of an elegant way to work though this without making the code much harder to read.

One possibility is to create a new array of those events that have "passed" the checks, but I suspect the memory management of adding/removing elements on an array is much more than doing simple integer checks.

dognose · Accepted Answer · 2014-04-08T21:15:04.450

1

Instead of using foreach, you could use two numeric loops there. Then your 'inner loop' can start at the current outer value+1, and you wont process any double combinations.

$cachedCount = count($array);
for ($outer = 0; $outer < $cachedCount; $outer++){
  for ($inner = $outer+1; $inner< $cachedCount; $inner++){
    // process entries. no duplicates here. Only the ones that are dupplicates, 
    // but shouldn't be.
  }
}

i.e. assuming the array

0 -> "a"
1 -> "b"
2 -> "c"
3 -> "d"

it will have the following steps:

outer = 0; inner = 1; //a-b
outer = 0; inner = 2; //a-c
outer = 0; inner = 3; //a-d

outer = 1; inner = 2; //b-c
outer = 1; inner = 3; //b-d

outer = 2; inner = 3; //c-d

so all combinations are checked but no duplicate (a-c and c-a) or trivial combination (a-a, b-b) processed. Obviously for 2 Elements, it would start and end with comparing 0-1. (Thats ONE comparission compared to 4 comparissions doing a 2by2 iteration (1-1, 1-2, 2-1, 2-2)

edited Apr 08 '14 at 21:15

answered Apr 08 '14 at 15:43

dognose

20,360
9
61
107

Noticing a few updates there... I think your onto something... and one thing to change is the repeated calls to count(), that can be cached. – Craig Francis Apr 08 '14 at 15:59
@CraigFrancis that can be cached yes. As the count function does not work without parameters, it is just a "verbal" example. ;) – dognose Apr 08 '14 at 16:00
just tested and getting 0.1 seconds now... thanks for updating to show the cachedCount (I know its irrational, but I hate seeing function calls in that bit of a for statement)... will do some more testing, but I think we have a winner. – Craig Francis Apr 08 '14 at 16:07
@CraigFrancis Ignore the comment notification - it was correct :P – dognose Apr 08 '14 at 21:10
@CraigFrancis I like the way how you elaborated all given answers. This adds "quality" for all further visitors stumbling over the same question! – dognose Apr 08 '14 at 21:23
Hopefully it also shows I appreciate everyones time, and it makes me think though each answer to make sure I've not missed anything important :-) – Craig Francis Apr 09 '14 at 13:23

lemiant · Answer 2 · 2014-04-08T16:01:21.827

0

You can use indices to adjust the size of the loops:

<?php
$arr = array( '1', '2', '3', '4' );
for($a=0; $a<count($arr); $a++){
    for($b=0; $b<$a; $b++){
        echo "compare: ".$arr[$a]." ".$arr[$b].",\n";
    }
}

//Ouput
//compare: 2 1,
//compare: 3 1,
//compare: 3 2,
//compare: 4 1,
//compare: 4 2,
//compare: 4 3,

http://codepad.org/BdxzvTvj

Mind you, while this is a good solution it will only provide 50% speed up. There are algorithmic changes which could dwarf the size of those gains.

edited Apr 08 '14 at 16:01

answered Apr 08 '14 at 15:44

lemiant

4,205
4
31
38

Thanks @lemiant, unfortunately dognose managed to beat you by a few seconds... but thanks for helping, I don't think I should be allowed behind a keyboard this afternoon... also, minor tweak, but if you cache the count() function call, you can also save a few ms. – Craig Francis Apr 08 '14 at 16:15

score 0 · Answer 3 · answered Apr 08 '14 at 15:45

0

Perhaps something like:

foreach ($results as $p_id => $p_result) {
    foreach (array_slice($results, $p_id+1, NULL, true) as $c_id => $c_result) {
        if ($p_result['from'] < $c_result['to'] && $p_result['to'] > $c_result['from']) {
            $results[$p_id]['match'] = $c_id;
            break;
        }
    }

}

answered Apr 08 '14 at 15:45

Mark Baker

209,507
32
346
385

Looking good... while not particularly scientific thats 0.6 seconds (will also check the others)... now just wondering if the array_slice() could be replaced with an inner `for`, so we don't create a new set of arrays in memory? – Craig Francis Apr 08 '14 at 15:57
Thanks Mark, while its basically the same as @dognose's solution, the use of the array_slice() slowed this solution down a bit... but thanks for the suggestion, certainly good having a second pair of eyes. – Craig Francis Apr 08 '14 at 16:10

Array loop performance when looking for duplicate/collisions

3 Answers3