0

Using the code below, I can create an associative array from a CSV file. It works fine, only problem is, the first key for each row is saved with double quotes (") and I don't understand why.

/* Map Rows and Loop Through Them */
$rows   = array_map(function($row) { return str_getcsv($row, ';', '"'); }, file('test.csv'));
$header = array_shift($rows);
$csv    = array();
foreach($rows as $row) {
    $csv[] = array_combine($header, $row);   
}

echo "<pre>";
print_r($csv);
echo "</pre>";

When I print, I get this

Array
(
    [0] => Array
        (
            ["TEST"] => 
            [EMAIL] => mail@test.com
            [NAVN] => Donald Duck
            [ADDRESSE] => Paradisæblevej 111
            [POSTNRBY] => 1234  Andeby
            [TELEFON] => 12345678
            [TUR] => 49
            [TURNAVN] => Title
            [ANTAL] => 1
            [BELOEBIALT] => 695
            [BONKODE] => 99900714
        )

    [1] => Array
        (
            ["TEST"] => 
            [EMAIL] => mail@test.com
            [NAVN] => Donald Duck
            [ADDRESSE] => Paradisæblevej 111
            [POSTNRBY] => 1234  Andeby
            [TELEFON] => 12345678
            [TUR] => 49
            [TURNAVN] => Title
            [ANTAL] => 1
            [BELOEBIALT] => 695
            [BONKODE] => 99900714
        )

)

My header looks like this:

"TEST";"EMAIL";"NAVN";"ADDRESSE";"POSTNRBY";"TELEFON";"TUR";"TURNAVN";"ANTAL";"BELOEBIALT";"BONKODE"

CSV file looks like this:

"TEST";"EMAIL";"NAVN";"ADDRESSE";"POSTNRBY";"TELEFON";"TUR";"TURNAVN";"ANTAL";"BELOEBIALT";"BONKODE"
"";"mail@test.com";"Donald Duck";"Paradisæblevej 111";"1234  Andeby";"12345678";"49";"Title";"1";"695";"99900714"
"";"mail@test.com";"Donald Duck";"Paradisæblevej 111";"1234  Andeby";"12345678";"49";"Title";"1";"695";"99900714"

Note the quotes in "Test"... How can I fix this?

Thanks

Dyvel
  • 847
  • 1
  • 8
  • 20
  • 1
    Any funky invisible characters in the `"TEST"` header…? Check with a hex editor. – deceze Dec 03 '18 at 12:26
  • can you give some data from csv? (maybe quotes inside the header itself) – danielpopa Dec 03 '18 at 12:32
  • If I remove the TEST header, then double quotes appear in EMAIL instead. There are no invisible characters in the file besides line return at the end of each line. – Dyvel Dec 03 '18 at 12:39
  • Here's a likely guess: Your file starts with a Unicode BOM (byte order mark), which is invisible but convinces `str_getcsv` that this cell is _not_ surrounded by quotes. – alexis Dec 03 '18 at 12:43
  • *"There are no invisible characters in the file"* – You have confirmed this how? – deceze Dec 03 '18 at 12:51
  • Possible duplicate of [Remove BOM () from imported .csv file](https://stackoverflow.com/questions/32184933/remove-bom-%c3%af-from-imported-csv-file) – alexis Dec 03 '18 at 14:40
  • You couldn't have known, but we now know that your question is a duplicate. There's code there for how to check for and skip the BOM, see if you can apply it to your code. – alexis Dec 03 '18 at 14:42

1 Answers1

2

Here's a guess: Your file starts with a Unicode BOM (byte order mark), which is invisible but convinces str_getcsv that this cell is not surrounded by quotes.

To test it: Open your csv file in an editor and save it as Latin-1, or any other 8-bit encoding. The problem should disappear (but you might mangle the content).

To fix it: Ideally, you would specify the correct encoding (utf-8?) when you open the file. PHP doesn't seem to define a UTF-8+BOM encoding, though. So you can try to apply the (ugly) solution from the answer to this question, which is about the same problem. (Or you can switch to Python, which does have a utf-8-bom encoding :-P )

alexis
  • 48,685
  • 16
  • 101
  • 161
  • I actually just tried to create a new csv file directly on the server and copy the content over. Then the issue disappeared. So this could mean file encoding issue, or the Unicode BOM you mentioned. – Dyvel Dec 03 '18 at 13:02
  • This is a BOM issue. If I do a json_encode on my $header I see this output ["\ufeff\"EMAIL\"","NAVN","ADDRESSE","POSTNRBY","TELEFON","TUR","TURNAVN","ANTAL","BEL\u00d8BIALT","BONKODE"] – Dyvel Dec 03 '18 at 13:33
  • Thought so, I've been bitten by this kind of thing before. And the BOM is not even supposed to be there in UTF-8 files, Microsoft just likes to add it everywhere. – alexis Dec 03 '18 at 14:35