0

I'm grabbing chinese characters from .csv file and echoing out into HTML through PHP, having great difficulties grabbing them directly from .csv I copy-pasted into .txt and became easier to deal with the data, my hanzi_characters.txt have several hundreds of lines line this example:

hanzi_characters.txt

产品

產品

囚徒困境

不正当竞争

What I need and I cannot figure out how to do properly, is to show one hanzi in each line, like this:

I tried using foreach loops with str_split() and explode() as is considered a string, but only outputs ������.

Before running out of ideas I also tried with array_chunk() and array_slice() but as expected the result was the same as not using those methods.

I also tried this solution assigning $s = parts[0]; but couldn't make it work neither

Right now this is my code:

Index.php

<?php

$myfile = fopen("hanzi_characters.txt", "r") or die("Unable to open file!");

while (!feof($myfile)) {
    $printed = fgets($myfile);
    $parts = preg_split('/[\\s,]/u', $printed);
    $echo parts[0];
}
fclose($myfile);

?>

Current output:

产品

產品

囚徒困境

不正当竞争

Community
  • 1
  • 1
gma992
  • 111
  • 1
  • 1
  • 6

2 Answers2

0

I guess you can use php string-array here. Run a foreach loop to loop through all lines and then use php string-array.

foreach() //run for each line
{
    $q = "不正当竞争"; 
    for($i=0; $i <= strlen($q) - 1; $i++)
    {
        echo $q[$i] . "<br>";
    }
}
Kelvin Low
  • 390
  • 6
  • 22
Akshay
  • 2,244
  • 3
  • 15
  • 34
  • This solution won't work as I don't have a string of hanzi like `$q`, is one of the first approaches I took. Is an array, an each key of the array have a string of several chinese characters on it, that I cannot split properly in substrings. – gma992 Jul 25 '15 at 08:57
  • @gma992 Can you paste a 'var_dump' of the array ? – Akshay Jul 25 '15 at 18:43
0

PHP's multibyte string functions

What you're looking for are PHP's multibyte string functions, specifically mb_ereg_replace.

I think the actual statement you want is something like the following:

 mb_ereg_replace ("~\s+~", '<br\>', $string);

Check PHP's character set

In order to use mb_ereg_replace, the character set of the string you read form the file must be the same as the character set PHP is using. If you you read your file and output it wrapped in a <pre> tag, does it display the file with correct characters? If not, the encoding of the file is likely different than the encoding being used by PHP.

You might want to check out this guide on ensuring you are using utf8 or a similar encoding. Once that's done, and you know the encoding of the file, you can use mb_convert_encoding (make sure to use HTML-ENTITIES if this is sent to the browser) to convert the string you read in to the character set you are using in PHP.

MirroredFate
  • 12,396
  • 14
  • 68
  • 100
  • Thanks for the clue, I'll investigate that those functions but so far I had no luck with `mb_ereg_replace`, the characters are correctly `utf8` encoded thou, are displayed perfectly with `
    ` tags
    – gma992 Jul 25 '15 at 08:53