I am trying to parse a CSV file using PHP.
The file uses commas as delimiter and double quotes for fields containing comma(s), as:
foo,"bar, baz",foo2
The issue I am facing is that I get fields containing comma(s) separated. I get:
"2
rue du ..."
Instead of: 2, rue du ...
.
Encoding:
The file doesn't seem to be in UTF8. It has weird wharacters at the beginning (apparently not BOM, looks like this when converted from ASCII to UTF8: ÿþ
) and doesn't displays accents.
- My code editor (Atom) tells the encoding is UTF-16 LE
- using
mb_detect_encoding()
on the csv lines it returns ASCII
But it fails to convert:
mb_convert_encoding()
converts fromASCII
but returns asian characters fromUTF-16LE
iconv()
returns Notice: iconv(): Wrong charset, conversion fromUTF-16LE
/ASCII
toUTF8
is not allowed.
Parsing:
I tried to parse with this one-liner (see those 2 comments) using str_getcsv()
:
$csv = array_map('str_getcsv', file($file['tmp_name']));
I then tried with fgetcsv()
:
$f = fopen($file['tmp_name'], 'r');
while (($l = fgetcsv($f)) !== false) {
$arr[] = $l;
}
$f = fclose($f);
In both ways I get my adress field in 2 parts. But when I try this code sample I get correctly parsed fields:
$str = 'foo,"bar, baz",foo2,azerty,"ban, bal",doe';
$data = str_getcsv($str);
echo '<pre>' . print_r($data, true) . '</pre>';
To sum up with questions:
- What are the characters at the beginning of the file ?
- How could I be sure about the encoding ? (Atom reads the file with
UTF-16 LE
and doesn't display weird characters at the beginning) - What makes the csv parsing functions fail ?
- If I should rely on something else to parse the lines of the CSV, what could I use ?