Best practice looping through huge arrays

Question

I have two huge arrays:

array A has 4900 items (each item is a small array)
array B has 700 items (also each item is a small array)

So basically thees are my arrays:

A = array (
    [0] => array(
        "name" => "KE-KE IMPEX ",
        "email" => "someemai@gmail.com",
        "kezbCompany" => "Fragrance",
        "startDate" => "2013-03-25 00:00:00",
        "endDate" => "2014-03-25 00:00:00",
        "companyBase" => "06 20 232 2534"
    )
    ...
    [4900] => array(
        "name" => "Jane Doe",
        "email" => "zzer@sad.com",
        "kezbCompany" => "sadsad",
        "startDate" => "2013-03-25 00:00:00",
        "endDate" => "2014-03-25 00:00:00",
        "companyBase" => "06 20 232 2534"
    )
)

B = array (
    [0] => array(
        "name" => "KE-KE IMPEX 46554 sda",
        "email" => "xxx@gmail.com",
        "kezbCompany" => "546wer",
        "startDate" => "2013-03-25 00:00:00",
        "endDate" => "2014-03-25 00:00:00",
        "companyBase" => "06 20 232 2534"
    )
    ...
    [700] => array(
        "name" => "45 Jane Doe",
        "email" => "kekeimpex@gmail.com",
        "kezbCompany" => "asd",
        "startDate" => "2013-03-25 00:00:00"        
    )
)

The small items look like this for example (booth in A and in B):

array(
  'name' => 'John Doe',
  'email' => 'john@doe.com'
)

So what I need to do is: check which small array has the same name.

But please keep in mind that most of the time the two small arrays wont be the same in structure.

So for example maybe their email are different. Right now, if I loop through the A first and inside that I loop through the B it takes a whole lot of time.

This is my current code:

$szData = file_get_contents('szData.txt');
$kData = file_get_contents('kData.txt');

$A = json_decode($szData);
$B = json_decode($kData);

$foundNr = 0;

foreach ($A as $key => $sz)
{
    $cName = $sz->companyName;

    foreach ($B as $index => $k)
    {
        $pattern = '/^(.*)+('.$cName.')/i';

        echo "SzSor: " . $key . " --- Ksor: " . $index . "</br>";

        if (preg_match($pattern, $k->companyName))
        {
            $founData[] = $k->companyName;

            ++$foundNr;
        }
    }
}

Any ideas?

Have a look here: http://www.phpdreams.com/blog-posts/best-practice-array-loops.html. This guy did benchmarks on the different ways of looping through arrays and the best practices (at the time of writing) for handling large arrays. Hope this helps! — Derik Nel, May 29 '14 at 12:49
This is a great example of just one of the reasons why we use database management systems. — symcbean, May 29 '14 at 12:55
@DerikNel That link is awful advice, they've triggered PHP to make a million copies of the array, then are complaining that it's slow(You'll also notice it consumes a load of memory too, since it holds onto all those arrays at the same time until it can dereference them at the end). PHP has a by reference feature for updating the array as you iterate over it to avoid this. `foreach($array AS &$key){ $key++; }`, in my tests this comes out at almost 20% faster than any alternative methods he suggested. See [this answer](http://stackoverflow.com/a/14854568/97513) for more info. — scragar, May 29 '14 at 14:04
@scragar Agreed, the link was mostly to demonstrate the speed difference between the different methods of iterating through arrays. — Derik Nel, May 29 '14 at 17:55
@Derik and it's flat out misrepresting the speed of foreach, a ton of optimisation has gone into foreach to make it faster than a traditional loop, in order for his example to be slower he had to deliberately cripple the execution of the foreach. It is not a good comparison of the speed differences because it's wrong. — scragar, May 29 '14 at 20:17

score 0 · Answer 1 · answered May 29 '14 at 13:09

One solution is first to merge all the data in the sub array in a imploded string (John Doe-john@doe.com) and then use a fast function to compare the matching strings:

$szData = file_get_contents('szData.txt');
$kData = file_get_contents('kData.txt');

$A = json_decode($szData);
$B = json_decode($kData);

$foundNr = 0;

$B_sample = array();
foreach  ($B as $index => $data){
    $B_sample[$index] = implode('-',$data);
    //or use some other custom procedure to create unique string from the data.
}


$B_sample = array();
foreach  ($A as $index => $data){
    $A_sample[$index] = implode('-',$data);
}

$A_B_intersect = array_intersect ($A_sample , $B_sample ); // the KEYS in the result array are from $A

$foundNr = count($A_B_intersect);

foreach($A_B_intersect as $key->$data){
    $founData[] = $A[$key]->companyName;
}

This has extra benefit if you can create $A_sample while building $A, reusing the same loop.

score -1 · Answer 2 · answered May 29 '14 at 12:52

You would probably be best of trying to avoid using your own loops and the like here and trying to use the built-in PHP functions like:

in_array:

Code taken from docs:

<?php
$os = array("Mac", "NT", "Irix", "Linux");
if (in_array("Irix", $os)) {
    echo "Got Irix";
}
if (in_array("mac", $os)) {
    echo "Got mac";
}
?>

The second condition fails because in_array() is case-sensitive, so the program above will display:

Got Irix

Example #2 in_array() with strict example
<?php
$a = array('1.10', 12.4, 1.13);

if (in_array('12.4', $a, true)) {
    echo "'12.4' found with strict check\n";
}

if (in_array(1.13, $a, true)) {
    echo "1.13 found with strict check\n";
}
?>

The above example will output:

1.13 found with strict check

So, while you might still have to loop through one array to compare all the possible values, the most efficient code will often come from using the built-in functions.

Having said all that, write a few simple loops and test it to see what performance is best. If you know the structure of your data, you might be able to skip many elements entirely - if you know that the data will be coming in an alphabetical manner for example, and you are looking for something that starts with an "S", do a loop that skips 100 records at a time until you get to an "S" value - and then go back to the last entry before it. Things like that. This is the sort of thinking that makes for quick efficient code - not performing searches element by element if you can skip 99 out of a 100 elements for the first 3500 out of 5000.

@Mr.Sam Hi, this is [strtolower](http://www.php.net/manual/en/function.strtolower.php) and it wants to be friends :) — Fluffeh, May 29 '14 at 12:55
If the objective is to write efficient code, then using the builtin array functions is usually slower than a foreach loop. And the place to start optimizing the performance of the code shown is to get rid of the regex. Also your logic is different from that posted above. — symcbean, May 29 '14 at 12:59
Sorry but i think this is not good for me, im going to edit my question, because i just figured out its not so clear — Mr. Sam, May 29 '14 at 12:59
@Fluffeh most of the time the two small arrays wont be the same in structure. So i dont think i can use in_array(), because i will get a very small percentage of "same arrays". Thats why i use regex. — Mr. Sam, May 29 '14 at 13:03

Best practice looping through huge arrays

2 Answers2