As this solution was adapted from a well-voted answer elsewhere, I was not expecting to encounter problems.
Problem: I want to LEFT JOIN
file0.csv with file1.csv.
file0.csv
+-----------------+-------------+--------------+
| Manufacturer ID | image | description |
+-----------------+-------------+--------------+
| SKU231 | image1.jpg | A box. |
| SKUAG1 | image22.jpg | Another box. |
| SKU21D | image7a.png | A third box. |
+-----------------+-------------+--------------+
file1.csv:
+--------+--------+--------+-------+-------+
| mpn | length | height | width | title |
+--------+--------+--------+-------+-------+
| SKU231 | 22 | 14 | 10 | Box 1 |
| SKUAG1 | 12 | 6 | 6 | Box 2 |
| SKU21D | 5 | 8 | 5 | Box 3 |
+--------+--------+--------+-------+-------+
Desired result (file2.csv):
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| Manufacturer ID | image | description | mpn | length | height | width | title |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| SKU231 | image1.jpg | A box. | SKU231 | 22 | 14 | 10 | Box 1 |
| SKUAG1 | image22.jpg | Another box. | SKUAG1 | 12 | 6 | 6 | Box 2 |
| SKU21D | image7a.png | A third box. | SKU21D | 5 | 8 | 5 | Box 3 |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
PHP function to LEFT JOIN
file0.csv
with file1.csv
on Manufacturer ID
and mpn
respectively:
function my_csv_join(array $csv_input_file_array, $csv_output_file, $html_preview, $left_join_on, $right_join_on = NULL) {
if (count($csv_input_file_array) > 2) {
echo 'This function probably only works for 2 csv files being joined at a time. Reapply iteratively if needed. Exiting now.';
exit;
} else {
for ($x = 0; $x <= 1; $x++) {
//get csv file to array
${'file' . $x} = fopen($csv_input_file_array[$x], "r"); //Dynamic variables using braces: https://stackoverflow.com/a/9257536/9095603
while (!feof(${'file' . $x})) {
${'csv_array' . $x}[] = fgetcsv(${'file' . $x});
}
fclose(${'file' . $x});
${'csv_array' . $x} = array_filter(${'csv_array' . $x}); // gets rid of last empty array in case present
//CREATE HEADERED ARRAY
${'header' . $x} = array_shift(${'csv_array' . $x});
foreach (${'csv_array' . $x} as ${'product_data_array' . $x} {
${'headered_array' . $x}[] = array_combine(${'header' . $x}, ${'product_data_array' . $x});
}
}
// How to simulate the SQL LEFT JOIN operation using PHP arrays? https://stackoverflow.com/a/25837426
//function to simulate the left join
$left = $headered_array0;
$right = $headered_array1;
$final = array();
if (empty($right_join_on)) // if $right_join_on omitted implies $right_join_on has the same value as $left_join_on
$right_join_on = $left_join_on;
foreach ($left AS $k => $v) {
$final[$k] = $v; //basically keeping everything; $left just becomes $final
foreach ($right AS $kk => $vv) {
if ($v[$left_join_on] == $vv[$right_join_on]) {
foreach ($vv AS $key => $val)
$final[$k][$key] = $val;
} else {
foreach ($vv AS $key => $val)
$final[$k][$key] = NULL;
}
}
}
if ($html_preview == 'y') {
echo '<pre>';
var_dump($final);
echo '</pre>';
}
// REINSTATE HEADERS
// var_dump($final[0]);
$indented_header = array(
0 => array_keys($final[0])
);
$re_headered_array = array_merge($indented_header, $final);
// write array to csv file
$file2 = fopen($csv_output_file, "w");
foreach ($re_headered_array as $line) {
fputcsv($file2, $line);
}
fclose($file2);
}
}
Call to function my_csv_join()
:
my_csv_join(array('C:\xampp\htdocs\kalugi\file0.csv','C:\xampp\htdocs\kalugi\file1.csv'), 'C:\xampp\htdocs\kalugi\file2.csv','y','Manufacturer ID','mpn');
Actual result shows not all records matching on the file0."Manufacturer ID" = file1.mpn
are being matched. Thus some expected rows that satisfied the join condition were not joined. We have NULL
in place of their values:
var_dump
of result:
array(3) {
[0]=>
array(8) {
["Manufacturer ID"]=>
string(6) "SKU231"
["image"]=>
string(10) "image1.jpg"
["description"]=>
string(6) "A box."
["mpn"]=>
NULL
["length"]=>
NULL
["height"]=>
NULL
["width"]=>
NULL
["title"]=>
NULL
}
[1]=>
array(8) {
["Manufacturer ID"]=>
string(6) "SKUAG1"
["image"]=>
string(11) "image22.jpg"
["description"]=>
string(12) "Another box."
["mpn"]=>
NULL
["length"]=>
NULL
["height"]=>
NULL
["width"]=>
NULL
["title"]=>
NULL
}
[2]=>
array(8) {
["Manufacturer ID"]=>
string(6) "SKU21D"
["image"]=>
string(11) "image7a.png"
["description"]=>
string(12) "A third box."
["mpn"]=>
string(6) "SKU21D"
["length"]=>
string(1) "5"
["height"]=>
string(1) "8"
["width"]=>
string(1) "5"
["title"]=>
string(5) "Box 3"
}
}
Result as written out to $file2 (file2.csv):
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+ | Manufacturer ID | image | description | mpn | length | height | width | title | +-----------------+-------------+--------------+--------+--------+--------+-------+-------+ | SKU231 | image1.jpg | A box. | | | | | | | SKUAG1 | image22.jpg | Another box. | | | | | | | SKU21D | image7a.png | A third box. | SKU21D | 5 | 8 | 5 | Box 3 | +-----------------+-------------+--------------+--------+--------+--------+-------+-------+
Why are 2 of the rows not joined despite file0."Manufacturer ID" = file1.mpn
being satisfied?