1

I have csv01 like that:

A    |  B   |   C     |  D 
-----------------------------
905  |  bla |   meh?  |  na
16   |  meh |   meh?  |  ni
4    |  bla |   meh?  |  put

I have csv02 like that:

A    |  Z   |   Y     
---------------------
4    |  info |  meh   
16   |  info |  meh   
905  |  info |  meh  

How could I get the info based on column A numbers , which are in both files ?

These numbers are in both files , not in the same order. Sometimes the data is blank so it could out put blank or "null" result for that case.

A    |  B   |   C     |  D     |  Z   
-------------------------------------
905  |  bla |   meh?  |  na   |  info
16   |  meh |   meh?  |  ni   |  info
4    |  bla |   meh?  |  put  |  info

What I'm trying to do is to merge both CSV's ( values of each column when the value is the same ) based on matching number values:905 ,16 ,4...

I tried but not getting any result, it only output some incorrect data on one line:

<?php
   $fh = fopen('csv1.csv', 'r');
        $fhg = fopen('csv2.csv', 'r');
         while (($data = fgetcsv($fh, 0, ",")) !== FALSE) {
            $csv1[3]=$data;
        }
        while (($data = fgetcsv($fhg, 0, ",")) !== FALSE) {
                $csv2[0]=$data;
        }

        for($x=0;$x< count($csv2);$x++)
        {
            if($x==0){
                unset($csv1[0][3]);
                $line[$x]=array_merge($csv2[0],$csv1[3]); //header
            }
            else{
                $deadlook=0;
                for($y=0;$y <= count($csv1);$y++)
                {
                    if($csv1[$y][3] == $csv2[$x][0]){
                        unset($csv1[$y][3]);
                        $line[$x]=array_merge($csv2[$x],$csv1[$y]);
                        $deadlook=1;
                    }
                }
                if($deadlook==0)
                    $line[$x]=$csv2[$x];
            }
        }

        $fp = fopen('final.csv', 'w');//output file set here

        foreach ($line as $fields) {
            fputcsv($fp, $fields);
        }
        fclose($fp);
?>

Have been trying to use fgetcsv and used keys. My code is not working at all I give up . Any ideas on how I could achieve this ?

mlclm
  • 725
  • 6
  • 16
  • 38
  • `$csv1[3]` and `$csv2[0]` are only ever going to give you one value. You need to keep one file open and during iteration of that bring in the other file and search for your value. This will be heavy on resources. Likely this should be put into a DB, or stored in a DB as a starting point. With DB this would just be `join table using(A)` – user3783243 Aug 25 '20 at 03:24
  • Im gonna try with python and panda , as I just happen to have these installed. This should help : https://stackoverflow.com/questions/54383305/merge-two-csv-files-based-on-a-data-from-the-first-column – mlclm Aug 25 '20 at 03:54
  • Related: [Implement left join with PHP between two csv files](https://stackoverflow.com/q/59921093/2943403) – mickmackusa Oct 05 '22 at 21:40

1 Answers1

3

This can be done relatively easy in PHP, no need for Python's Pandas.

a.csv:

905,bla,meh?,na3
16,meh,meh?,ni2
4,bla,meh?,put1

b.csv:

4,info,meh1
16,info,meh2
905,info,meh3
<?php

// load both files to array
$a = array_map('str_getcsv', file('a.csv'));
$b = array_map('str_getcsv', file('b.csv'));

var_dump($a, $b);

// index both arrays by the column you want to merge by
$mergeColumn = 0;
$a = array_combine(array_column($a, $mergeColumn), $a);
$b = array_combine(array_column($b, $mergeColumn), $b);

var_dump($a, $b);

// construct combined array by combining same indexes from both arrays
// and merging the values (skip merge column from second file so it is not doubled)
$c = [];
foreach ($a as $k => $dataA) {
  unset($b[$k][$mergeColumn]);
  $c[$k] = array_merge($dataA, $b[$k]);
}

var_dump($c);

// put it to output csv file
$fp = fopen('c.csv', 'w');
foreach ($c as $row) {
    fputcsv($fp, $row);
}
fclose($fp);

Output:

// a.csv as array
array(3) {
  [0]=>
  array(4) {
    [0]=>
    string(3) "905"
    [1]=>
    string(3) "bla"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(3) "na3"
  }
  [1]=>
  array(4) {
    [0]=>
    string(2) "16"
    [1]=>
    string(3) "meh"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(3) "ni2"
  }
  [2]=>
  array(4) {
    [0]=>
    string(1) "4"
    [1]=>
    string(3) "bla"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(4) "put1"
  }
}

// b.csv as array
array(3) {
  [0]=>
  array(3) {
    [0]=>
    string(1) "4"
    [1]=>
    string(4) "info"
    [2]=>
    string(4) "meh1"
  }
  [1]=>
  array(3) {
    [0]=>
    string(2) "16"
    [1]=>
    string(4) "info"
    [2]=>
    string(4) "meh2"
  }
  [2]=>
  array(3) {
    [0]=>
    string(3) "905"
    [1]=>
    string(4) "info"
    [2]=>
    string(4) "meh3"
  }
}

// a.csv keyed by merge column
array(3) {
  [905]=>
  array(4) {
    [0]=>
    string(3) "905"
    [1]=>
    string(3) "bla"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(3) "na3"
  }
  [16]=>
  array(4) {
    [0]=>
    string(2) "16"
    [1]=>
    string(3) "meh"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(3) "ni2"
  }
  [4]=>
  array(4) {
    [0]=>
    string(1) "4"
    [1]=>
    string(3) "bla"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(4) "put1"
  }
}

// b.csv keyed by merge column
array(3) {
  [4]=>
  array(3) {
    [0]=>
    string(1) "4"
    [1]=>
    string(4) "info"
    [2]=>
    string(4) "meh1"
  }
  [16]=>
  array(3) {
    [0]=>
    string(2) "16"
    [1]=>
    string(4) "info"
    [2]=>
    string(4) "meh2"
  }
  [905]=>
  array(3) {
    [0]=>
    string(3) "905"
    [1]=>
    string(4) "info"
    [2]=>
    string(4) "meh3"
  }
}

// combined array
array(3) {
  [905]=>
  array(6) {
    [0]=>
    string(3) "905"
    [1]=>
    string(3) "bla"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(3) "na3"
    [4]=>
    string(4) "info"
    [5]=>
    string(4) "meh3"
  }
  [16]=>
  array(6) {
    [0]=>
    string(2) "16"
    [1]=>
    string(3) "meh"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(3) "ni2"
    [4]=>
    string(4) "info"
    [5]=>
    string(4) "meh2"
  }
  [4]=>
  array(6) {
    [0]=>
    string(1) "4"
    [1]=>
    string(3) "bla"
    [2]=>
    string(4) "meh?"
    [3]=>
    string(4) "put1"
    [4]=>
    string(4) "info"
    [5]=>
    string(4) "meh1"
  }
}

and c.csv (outcome):

905,bla,meh?,na3,info,meh3
16,meh,meh?,ni2,info,meh2
4,bla,meh?,put1,info,meh1

Above solution works if both files have the same identifiers that merging is performed on (first column) If there can be a row that doesnt have a match in other csv file then another approach is needed.

a.csv:

A,B,C,D
905,bla,meh?,na3
16,meh,meh?,ni2
4,bla,meh?,put1
1,a,b,c

b.csv:

A,Z,Y
4,info,meh1
16,info,meh2
905,info,meh3
2,d,e

a.csv contains a row with A: 1, there is no match for that in b.csv. b.csv contains a row with A: 2, there is no match for that in a.csv. We expect result csv to have 5 rows (4, 16, 905 - common + 1 with empty Z,Y + 2 with empty B,C,D).

code:

<?php

$a = parseCsv('a.csv');
$b = parseCsv('b.csv');

$allHeaders = array_unique(array_merge($a['header'], $b['header']));

$mergeColumn = 'A';
$a['rows'] = array_combine(array_column($a['rows'], $mergeColumn), $a['rows']);
$b['rows'] = array_combine(array_column($b['rows'], $mergeColumn), $b['rows']);

$allIndexes = array_unique(array_merge(array_column($a['rows'], $mergeColumn), array_column($b['rows'], $mergeColumn)));

$c = [];
foreach ($allIndexes as $index) {
  $row = [];
  foreach ($allHeaders as $header) {
    $row[$header] = '';
    if (isset($a['rows'][$index][$header])) {
      $row[$header] = $a['rows'][$index][$header];
    } elseif (isset($b['rows'][$index][$header])) {
      $row[$header] = $b['rows'][$index][$header];
    }
  }

  $c[$index] = $row;
}

var_dump($c);

$fp = fopen('c.csv', 'w');
fputcsv($fp, $allHeaders);
foreach ($c as $row) {
    fputcsv($fp, $row);
}
fclose($fp);


function parseCsv(string $file): array {
  $rows = array_map('str_getcsv', file($file));
  $header = array_shift($rows);
  $csv = [];
  foreach($rows as $row) {
    $csv[] = array_combine($header, $row);
  }

  return ['header' => $header, 'rows' => $csv];
}

Output:

array(5) {
  [905]=>
  array(6) {
    ["A"]=>
    string(3) "905"
    ["B"]=>
    string(3) "bla"
    ["C"]=>
    string(4) "meh?"
    ["D"]=>
    string(3) "na3"
    ["Z"]=>
    string(4) "info"
    ["Y"]=>
    string(4) "meh3"
  }
  [16]=>
  array(6) {
    ["A"]=>
    string(2) "16"
    ["B"]=>
    string(3) "meh"
    ["C"]=>
    string(4) "meh?"
    ["D"]=>
    string(3) "ni2"
    ["Z"]=>
    string(4) "info"
    ["Y"]=>
    string(4) "meh2"
  }
  [4]=>
  array(6) {
    ["A"]=>
    string(1) "4"
    ["B"]=>
    string(3) "bla"
    ["C"]=>
    string(4) "meh?"
    ["D"]=>
    string(4) "put1"
    ["Z"]=>
    string(4) "info"
    ["Y"]=>
    string(4) "meh1"
  }
  [1]=>
  array(6) {
    ["A"]=>
    string(1) "1"
    ["B"]=>
    string(1) "a"
    ["C"]=>
    string(1) "b"
    ["D"]=>
    string(1) "c"
    ["Z"]=>
    string(0) ""
    ["Y"]=>
    string(0) ""
  }
  [2]=>
  array(6) {
    ["A"]=>
    string(1) "2"
    ["B"]=>
    string(0) ""
    ["C"]=>
    string(0) ""
    ["D"]=>
    string(0) ""
    ["Z"]=>
    string(1) "d"
    ["Y"]=>
    string(1) "e"
  }
}

c.csv:

905,bla,meh?,na3,info,meh3
16,meh,meh?,ni2,info,meh2
4,bla,meh?,put1,info,meh1
1,a,b,c,,
2,,,,d,e
blahy
  • 1,294
  • 1
  • 8
  • 9
  • Something is wrong and I get 'Undefined index: ... ' because of line 18 : foreach ($a as $k => $dataA) { unset($b[$k][$mergeColumn]); $c[$k] = array_merge($dataA, $b[$k]); } – mlclm Aug 25 '20 at 19:05
  • you posted 3 lines, so which one is it? This is standalone script with input files above and it doesnt issue any warnings - so its either your csvs or some changes you made - so I cant really answer that without the code – blahy Aug 25 '20 at 19:20
  • This line causes an error "Notice: Undefined index:..." $c[$k] = array_merge($dataA, $b[$k]); . The CSV files don't have the exact same number of rows. But have same number of columns, and the 1st column in each csv is the "key" number. ( is not always in the other csv file ). Here is a copy/paste of my CSV from notepad https://pastebin.com/K82BnViQ ( I also I tried to replace $mergeColumn = 0; with $mergeColumn = 1; and then it says: Warning: array_combine(): Both parameters should have an equal number of elements. ) , I have put back $mergeColumn = 0; but still got the index problem. – mlclm Aug 25 '20 at 19:55
  • You wrote "These numbers are in both files" so i assumed both files contains the same columns that we match them by (but the order can be different). I made and edit with second solution where there can be rows that have no match in second file. – blahy Aug 25 '20 at 21:48
  • `array_combine(array_column($a, $mergeColumn), $a)` indicates that you are unaware that `array_column()` can receive up to 3 parameters. – mickmackusa Oct 05 '22 at 07:22
  • I wanted to set array indexes with one of the columns (retaining all the columns in the array itself). How do you want to do this with only array_column()? – blahy Oct 29 '22 at 22:52