3

As this solution was adapted from a well-voted answer elsewhere, I was not expecting to encounter problems.

Problem: I want to LEFT JOIN file0.csv with file1.csv.

file0.csv

+-----------------+-------------+--------------+
| Manufacturer ID |    image    | description  |
+-----------------+-------------+--------------+
| SKU231          | image1.jpg  | A box.       |
| SKUAG1          | image22.jpg | Another box. |
| SKU21D          | image7a.png | A third box. |
+-----------------+-------------+--------------+

file1.csv:

+--------+--------+--------+-------+-------+
|  mpn   | length | height | width | title |
+--------+--------+--------+-------+-------+
| SKU231 |     22 |     14 |    10 | Box 1 |
| SKUAG1 |     12 |      6 |     6 | Box 2 |
| SKU21D |      5 |      8 |     5 | Box 3 |
+--------+--------+--------+-------+-------+

Desired result (file2.csv):

+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| Manufacturer ID |    image    | description  |  mpn   | length | height | width | title |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+
| SKU231          | image1.jpg  | A box.       | SKU231 |     22 |     14 |    10 | Box 1 |
| SKUAG1          | image22.jpg | Another box. | SKUAG1 |     12 |      6 |     6 | Box 2 |
| SKU21D          | image7a.png | A third box. | SKU21D |      5 |      8 |     5 | Box 3 |
+-----------------+-------------+--------------+--------+--------+--------+-------+-------+

PHP function to LEFT JOIN file0.csv with file1.csv on Manufacturer ID and mpn respectively:

function my_csv_join(array $csv_input_file_array, $csv_output_file, $html_preview, $left_join_on, $right_join_on = NULL) {
    if (count($csv_input_file_array) > 2) {
        echo 'This function probably only works for 2 csv files being joined at a time.  Reapply iteratively if needed.  Exiting now.';
        exit;
    } else {
        for ($x = 0; $x <= 1; $x++) {
            //get csv file to array 
            ${'file' . $x} = fopen($csv_input_file_array[$x], "r"); //Dynamic variables using braces: https://stackoverflow.com/a/9257536/9095603
    
            while (!feof(${'file' . $x})) {
                ${'csv_array' . $x}[] = fgetcsv(${'file' . $x});
            }
            fclose(${'file' . $x});
    
            ${'csv_array' . $x} = array_filter(${'csv_array' . $x}); // gets rid of last empty array in case present
            //CREATE HEADERED ARRAY
            ${'header' . $x} = array_shift(${'csv_array' . $x});
            foreach (${'csv_array' . $x} as ${'product_data_array' . $x} {
                ${'headered_array' . $x}[] = array_combine(${'header' . $x}, ${'product_data_array' . $x});
            }        
        }
        // How to simulate the SQL LEFT JOIN operation using PHP arrays?  https://stackoverflow.com/a/25837426

        //function to simulate the left join
        $left = $headered_array0;
        $right = $headered_array1;
        $final = array();

        if (empty($right_join_on)) // if $right_join_on omitted implies $right_join_on has the same value as $left_join_on
            $right_join_on = $left_join_on;
        foreach ($left AS $k => $v) {
            $final[$k] = $v; //basically keeping everything; $left just becomes $final
            foreach ($right AS $kk => $vv) {
                if ($v[$left_join_on] == $vv[$right_join_on]) { 
                    foreach ($vv AS $key => $val)
                        $final[$k][$key] = $val; 
                } else {
                    foreach ($vv AS $key => $val)
                   $final[$k][$key] = NULL; 
                }
            }
        }
        if ($html_preview == 'y') {
            echo '<pre>';
                var_dump($final);
            echo '</pre>';
        }
    
        // REINSTATE HEADERS
        // var_dump($final[0]);
        $indented_header = array(
            0 => array_keys($final[0])
        );
        $re_headered_array = array_merge($indented_header, $final);
    
        // write array to csv file
        $file2 = fopen($csv_output_file, "w");
        foreach ($re_headered_array as $line) {
            fputcsv($file2, $line);
        }
    
        fclose($file2);
    }
}

Call to function my_csv_join():

my_csv_join(array('C:\xampp\htdocs\kalugi\file0.csv','C:\xampp\htdocs\kalugi\file1.csv'), 'C:\xampp\htdocs\kalugi\file2.csv','y','Manufacturer ID','mpn');

Actual result shows not all records matching on the file0."Manufacturer ID" = file1.mpn are being matched. Thus some expected rows that satisfied the join condition were not joined. We have NULL in place of their values:

var_dump of result:

array(3) {
  [0]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKU231"
    ["image"]=>
    string(10) "image1.jpg"
    ["description"]=>
    string(6) "A box."
    ["mpn"]=>
    NULL
    ["length"]=>
    NULL
    ["height"]=>
    NULL
    ["width"]=>
    NULL
    ["title"]=>
    NULL
  }
  [1]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKUAG1"
    ["image"]=>
    string(11) "image22.jpg"
    ["description"]=>
    string(12) "Another box."
    ["mpn"]=>
    NULL
    ["length"]=>
    NULL
    ["height"]=>
    NULL
    ["width"]=>
    NULL
    ["title"]=>
    NULL
  }
  [2]=>
  array(8) {
    ["Manufacturer ID"]=>
    string(6) "SKU21D"
    ["image"]=>
    string(11) "image7a.png"
    ["description"]=>
    string(12) "A third box."
    ["mpn"]=>
    string(6) "SKU21D"
    ["length"]=>
    string(1) "5"
    ["height"]=>
    string(1) "8"
    ["width"]=>
    string(1) "5"
    ["title"]=>
    string(5) "Box 3"
  }
}

Result as written out to $file2 (file2.csv):

    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+
    | Manufacturer ID |    image    | description  |  mpn   | length | height | width | title |
    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+
    | SKU231          | image1.jpg  | A box.       |        |        |        |       |       |
    | SKUAG1          | image22.jpg | Another box. |        |        |        |       |       |
    | SKU21D          | image7a.png | A third box. | SKU21D |      5 |      8 |     5 | Box 3 |
    +-----------------+-------------+--------------+--------+--------+--------+-------+-------+

Why are 2 of the rows not joined despite file0."Manufacturer ID" = file1.mpn being satisfied?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
user136649
  • 59
  • 1
  • 7

2 Answers2

0

I'm going to simplify this down to an SSCCE so it's easy to understand and reproduce.

I think you're over-complicating the solution here which is probably what's causing you so many weird bugs. The solution is actually rather simple. All you need to do is take each csv file, parse it into rows and columns, and then merge each of those rows into one row to write it back out to another csv.

This is the simplest way to do it:

$file0 = <<<'FILE0'
Manufacturer ID,image,description
SKU231,image1.jpg,A box.
SKUAG1,image22.jpg,Another box.
SKU21D,image7a.png,A third box.
FILE0;

$file1 = <<<'FILE1'
mpn,length,height,width,title
SKU231,22,14,10,Box 1
SKUAG1,12,6,6,Box 2
SKU21D,5,8,5,Box 3
FILE1;

$csv1 = array_map('str_getcsv', explode("\n", $file0));
$csv2 = array_map('str_getcsv', explode("\n", $file1));

$fd = fopen("/tmp/test.csv", 'w');


foreach ($csv1 as $row => $columns) {

    $newRow = array_merge($columns, $csv2[$row]);
    fputcsv($fd, $newRow);

}

$fd = fopen("/tmp/test.csv", "r");

while (($line = fgets($fd)) !== false) {
    echo $line;
}

The result:

"Manufacturer ID",image,description,mpn,length,height,width,title
SKU231,image1.jpg,"A box.",SKU231,22,14,10,"Box 1"
SKUAG1,image22.jpg,"Another box.",SKUAG1,12,6,6,"Box 2"
SKU21D,image7a.png,"A third box.",SKU21D,5,8,5,"Box 3"

|Manufacturer ID | image       | description  | mpn    | length | height | width | title |
|----------------|-------------|--------------|--------|--------|--------|-------|-------|
| SKU231         | image1.jpg  | A box.       | SKU231 | 22     | 14     | 10    | Box 1 |
| SKUAG1         | image22.jpg | Another box. | SKUAG1 | 12     | 6      | 6     | Box 2 |
| SKU21D         | image7a.png | A third box. | SKU21D | 5      | 8      | 5     | Box 3 |
Sherif
  • 11,786
  • 3
  • 32
  • 57
  • I like the simplicity of your solution and will try to re-write my code to incorporate it, but do you have any idea what was wrong in the original code? I spent 2 days trying to debug it and I'm itching to know what was wrong... – user136649 Jan 26 '20 at 18:53
  • 1
    A lot of things. That code is riddled with bugs. Which is why I found it counter intuitive to even attempt to fix it. – Sherif Jan 26 '20 at 18:54
  • Thanks. If there's any really key mistakes you saw in my original code, I'm happy to have them pointed out, as I'm learning PHP right now. – user136649 Jan 26 '20 at 19:04
  • I didn't see any *one* key mistake. I just saw a whole bunch of mistakes that would be rather unhelpful to the Stack Overflow community at large, to correct here. I find that what's more helpful is to address the solution to the given problem rather than spend this time as debugging exercise or code review session. I'm sorry if that isn't helpful to you, but I'm just trying to adhere to what's best for the community at large. – Sherif Jan 26 '20 at 19:06
  • 1
    To be clear this answer does NOT merge data related by `Manufacturer ID` / `mpn`. It relates the lines from the two csv files purely on their line position. This answer will NOT be reliable unless the lines are 100% guaranteed to be order with match manufacturers on every line. – mickmackusa Oct 01 '22 at 06:48
  • `what's best for the community at large.` enumerating the bugs alluded to in comments would be helpful - more helpful than a solution in terms of the learning value of the answer. – AD7six Oct 01 '22 at 14:05
0

I have not tested this script on an actual server, but I expect it to handle any number of csv files where a single unique identifying column can relate separate datasets.

Effectively, my snippet below will open and parse csv data, then assign first level keys based on the passed in column index which relates the two datasets. If a dataset has an identifying value which has been encountered before, then the rows of data can be merged.

A gentle reminder: when you think variable variables are a good idea, it is probably time for a code review and refactor.

$csvs = [
    ['file1.csv', 0],
    ['file2.csv', 0]
];

$result = [];
foreach ($csvs as [$filename, $idIndex]) {
    $file = fopen($filename, 'r');
    if ($file) {
        $headers = fgetcsv($file);
        while (($line = fgetcsv($file)) !== false) {
            $result[$line[$idIndex]] = array_merge(
                $result[$line[$idIndex]] ?? [],
                array_combine($headers, $line)
            );
        }
        fclose($file);
    }
}
var_export($result);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136