3

I would like to join two arrays in PHP, the same as I could do with MySQL join left. I have two arrays:

  • $db_products from database (it has id and sku fields)
  • $csv_prices from a CSV file (it has sku and price fields)

$db_products: (it has 4000 items)

Array
(
    [0] => Array
        (
            [id] => 1012
            [sku] => asd123
    )
    [N] => Array
    (
            [id] => ...
            [sku] => ...
    )
)

$csv_prices: (it has 8000 items)

Array
(
    [0] => Array
        (
            [sku] => asd123
            [price] => 15.50
    )
    [N] => Array
    (
            [sku] => ...
            [price] => ...
    )
)

The join is $db_products[$key][sku] = $csv_prices[$key][sku]. To find the matching pair, I am doing loops in loops which result in 4000 * 8000 check for match. It consumes a lot of energy and time, and I want to be more efficient.

I could lower by 10% the used time, by unset()-ing the used $csv_prices[$key], but I want to be more efficient.

livibetter
  • 19,832
  • 3
  • 42
  • 42
  • **you can look at this answer** [PHP ARRAY JOIN](https://stackoverflow.com/questions/18915944/join-multidimensional-array-like-sql/46951115#46951115) – Erdal CEYLAN Oct 26 '17 at 09:51
  • In 2019 I have implemented a PHP library to do join, order, group by , etc to php arrays. Name of the library is arraybase. https://github.com/whizsid/arraybase – Ramesh Kithsiri HettiArachchi Jun 05 '19 at 18:13

2 Answers2

4

If you loop through the $csv_products array once and set the SKU as the array key then you won't have to exponentially loop through that array each time you have a new product to find its match.

Instead, you just loop through the product array and use isset() to see if it exists or not.

This way you'll only need to do 1x count($db_products) and 1x count($csv_prices) repetitions, instead of count($db_products) * count($csv_prices) reps (12 thousand vs 32 million).

The concept could be considered to be similar to indexing in databases - you use the key you want to look up as the array key/index, meaning you don't have to loop through the array every time to find the record you want.

Example:

// Reindex the CSV array to use SKU as the key
$reindexed = array();
foreach ($csv_prices as $values) {
    $reindexed[$values['sku']] = $values;
}
unset($csv_prices);

// Join the prices
foreach ($db_products as &$product) {
    $sku = $product['sku'];
    $product['price'] = isset($reindexed[$sku]) ? $reindexed[$sku]['price'] : '';
}

Of course, you won't see a remarkable difference in performance until you start using large data sets - the change would be noticeable exponentially as the data sets grow.

scrowler
  • 24,273
  • 9
  • 60
  • 92
  • 1
    Am totatally amused. It has the same result, but runs under 0.9 seconds instead of 17-18sec. Thank you, sir! :) – Márk Cserepes Apr 14 '15 at 05:23
  • 1
    I was beginning to think I was the only one doing this as standard - lack of this knowledge is precisely why frameworks like Symfony don't scale well by default - but simple programming like this is precisely how it can be optimized. – Andy Gee Nov 30 '21 at 02:52
0

it's working fine for me

function left_join_arrays($key, $array1, $array2) {
    
    $i=0;
    foreach ($array1 as $arr1) {
        foreach ($array2 as $arr2) {
            if ($arr1[$key]==$arr2[$key]) {
                foreach(array_keys($arr2) as $key2){
                    if ($key != $key2) {
                        $array1[$i][$key2]=$arr2[$key2];
                    }
                }
                continue;
            }
        }
        $i++;
    }

    return $array1;
}
FabioSSena
  • 31
  • 2