1

Input csv file (note: there's a non-printable LF character \n after fruit):

1,"apple", "水果fruit\n",300
2,"donut", "甜點dessert",200

My PHP program:

function wpa(&$arr) { echo nl2br(print_r($arr, true)); }
header("Content-Type:text/html; charset=utf-8");
$lines = file("test.csv", FILE_IGNORE_NEW_LINES);

); wpa($lines);

Output:

Array
(
[0] => 1,"水果apple", "fruit

[1] => \n",300

[2] => 2,"甜點donut", "dessert",200
)

My question:
How can I read in the csv file and properly split it into 2 csv lines other than using fgetcsv? (note: Input file has BIG5-encoded Chinese characters and fgetcsv will mess up those Chinese characters on my PHP 5.2 environment)?

Scott Chu
  • 972
  • 14
  • 26
  • 1
    What's your problem with using `fgetcsv()`.... which does work with newlines inside quoted strings? – Mark Baker Oct 07 '15 at 16:45
  • fgetcsv will mess up my Chinese characters. – Scott Chu Oct 07 '15 at 16:46
  • fgetcsv() should be charset-independent – Mark Baker Oct 07 '15 at 16:47
  • But it's not with BIG5-encoded characters! There are some people rewrite fgetcsv or str_getcsv but I just wanna try to find an elegant answer. – Scott Chu Oct 07 '15 at 16:48
  • Explain in detail..... fgetcsv() should work, so show exactly how it is failing to work.... what do you get if you do a var dump after using fgetcsv() for the file you've posted here – Mark Baker Oct 07 '15 at 16:52
  • And most of the people who write their own versions of fgetcsv() or str_getcsv() don't do so because they have problems with those functions, but either because they're unaware of those functions, or don't understand csv files – Mark Baker Oct 07 '15 at 16:57
  • Say the data line is : "1,李小姐,female",the output from fgetcsv of 2nd field becomes �小姐, i.e. 李 becomes �. Since all links talking about this problem are Chinese web page, I find a similar problem with fgetcsv on this link: http://stackoverflow.com/questions/1472886/some-characters-in-csv-file-are-not-read-during-php-fgetcsv – Scott Chu Oct 07 '15 at 17:00
  • So have you tried anything like `setlocale()`? – Mark Baker Oct 07 '15 at 17:01
  • Yes, I did. This website reminds me this: "Please avoid extended discussions in comments. Would you like to automatically move this discussion to chat?". Should I stop talking here? – Scott Chu Oct 07 '15 at 17:05

1 Answers1

0

This is not a permanent answer but it takes care my problem:

Since the input file is edited under Windows, I write this code segments:

$data = file_get_contents("test.csv");
$lines = explode(PHP_EOL, $data); // or replace PHP_EOL with" \r\n"
echo nl2br(print_r($lines,true));
// parse with regular expression for each element in lines

It outputs correct 2 csv lines. But if the input file too large to read in at one time, I don't know the answer since line(), fgets(), .etc and other read-text-file functions all treats LF as line break even if you run PHP program under Windows.

Scott Chu
  • 972
  • 14
  • 26