0

I have a php script that reads a csv file (it has UTF-16LE encoding). The problem is that at some lines the array of php reading the lines of the csv is collapsed because of some Greek characters. A example is bellow (there are 7 elements at the array and the bellow has only 2), how can I solve this problem?

Array ( [0] => 205198 [1] => Label 4.2 Βάση για Σ▒ )

My code is bellow

$array = file_get_contents($this->listUrl);      
         $array = mb_convert_encoding($array, 'UTF8', 'UTF-16LE');   // Convert the file to UTF8
         $array = preg_split("/\R/", $array);                        // Split it by line breaks       
         $array = array_map(function ($v) {
             return str_getcsv($v, ";");
         }, $array);

[edit]I used the code below

$array = str_getcsv($array, "\n");
        foreach ($array as &$Row) {
            $Row = str_getcsv($Row, ";");          
        }
DimisV
  • 11
  • 4
  • 1
    This _should_ rather be done using `fgetcsv` with proper locale set (see https://stackoverflow.com/a/6160934/1427878) - if you _just_ split by line breaks, you risk messing up your data, if any of the cell _values_ could ever contain a line break. – CBroe Jul 04 '22 at 12:08
  • @CBroe , it seems that you are right. I use the code below $f = file_get_contents('file'); $f = mb_convert_encoding($f, 'UTF8', 'UTF-16LE'); $f = str_getcsv($f, "\n"); foreach($f as &$Row) { $Row = str_getcsv($Row, ";"); } – DimisV Jul 04 '22 at 13:07

1 Answers1

0

My best bet is that :

You need mb_split, since you are messing with multibyte strings to support GR lang.

Some theory :

UTF-8, with the famous ASCII = 1 byte.

UTF-16 with all unicode characters support = 4 bytes.

Some action :

"mb_split — Split multibyte string using regular expression" : PHP : mb_split

There are also similar functions as mb_ereg_replace.

Example :

$array = file_get_contents($this->listUrl);      
         $array = mb_convert_encoding($array, 'UTF8', 'UTF-16LE');   // Convert the file to UTF8
         $array = mb_split("/\R/", $array);                        // Split it by line breaks       
         $array = array_map(function ($v) {
             return str_getcsv($v, ";");
         }, $array);

Have fun !