0

I'm writing a php script in which I need to data from a CSV file in which some of the contents are written in Japanese. However, I can't get the data to read or display correctly at all.

The file I'm reading is encoded in the iso-8859-1 charset. I also tried using iconv to convert it to a UTF-8 encoded file however doing that seemed to break the data in the file entirely, and the text wouldn't display correctly in any applications afterwards.

Here's the script I'm using right now:

<?php 
    header("Content-Type: text/html; charset=ISO-8859-1"); 
    setlocale(LC_ALL, 'ja_JP.EUC-JP'); 
?>

<!DOCTYPE html>
<html lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <?php

        $row = 1;

        if (($handle = fopen("/srv/http/Japanese/testFile.csv", "r")) !== FALSE) {
            while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
                $row++;
                for ($i = 0; $i < 4; ++$i) {
                    echo $data[$i] . "<br />";
                }
                echo "<br />";
                if ($row > 1000) break;
            }
            fclose($handle);
        } else echo print_r(error_get_last(),true);
    ?>
</body>
</html>

The first two lines of PHP were added to try to fix the issue but it hasn't worked.

The output for a string in the file reading 引き込む, 762, 762, 7122 comes out looking like this:

°ú¤­¹þ¤à
762
762
7122

Also, it doesn't seem to be an issue solely with the display of the data. I also tried testing the data with if ($data[$i]) == "引き込む") and it seems to be false even when I do know that's the string being read.

I've also tried using other means of reading files, however no matter which PHP method I'm using to read the file I seem to get the exact same issue.

Any help would be greatly appreciated.

Liyara
  • 33
  • 6

2 Answers2

1

You need to either convert the csv file with iconv to ja_JP.EUC-JP (and set the charset value in the meta tag to this value too) or convert the csv to utf8 and set an appropriate charset (ja_JP.UTF8).

parapente
  • 343
  • 3
  • 11
0

I wanted to comment but I dont' have points so please forgive me if my answer is incorrect

From what i can find on google and Stackoverflow this seems to be a solution you just have to fit it into you code

This code

setlocale(LC_ALL, 'ja_JP');
$data = array_map('str_getcsv', file('japanese.csv'));
var_dump($data);

works with the following CSV file (japanese.csv, saved in UTF-8) on my local.

日本語,テスト,ファイル
2行目,CSV形式,エンコードUTF-8

The results are

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(9) "日本語"
    [1]=>
    string(9) "テスト"
    [2]=>
    string(12) "ファイル"
  }
  [1]=>
  array(3) {
    [0]=>
    string(7) "2行目"
    [1]=>
    string(9) "CSV形式"
    [2]=>
    string(20) "エンコードUTF-8"
  }
}

this might help you understand more: Like to other post

  • A proposed edit sug/ests that this is a copy of https://stackoverflow.com/questions/54528369/php-str-getcsv-does-not-parse-csv-correctly-if-it-contains-japanese-character (such information should not be an edit, but rather a plagiarism flag) – tripleee Jun 27 '22 at 17:36
  • @tripleee I wanted to just comment the link to the post but i dont have point to do it yet so thats why i copied a small snip of that post and i still included the link to said post – コーダーゴースト Jun 28 '22 at 04:54