1

I have written the following code in order to read content from an Excel file that contains Japanese characters and display on the web page:

<!DOCTYPE html>

<?php
    //header("Content-Type: text/plain; charset=UTF-8");  // output as text file
    header("Content-Type: text/html; charset=UTF-8");  
    if(isset($_POST['upload'])){

        unset($_POST['upload']);
        $file_name = basename($_FILES['csv_file']['name']);
        $name = pathinfo($file_name, PATHINFO_FILENAME );
        $ext = pathinfo($file_name, PATHINFO_EXTENSION);

        $csvFile = fopen($_FILES['csv_file']['tmp_name'], 'r');
        //skip first line
        fgetcsv($csvFile);
        $flag = true; // flag set false when query fails for one or more records
        while($line = fgetcsv($csvFile)){
            if(count($line)>0){
                $data = utf8_decode($line[0]);
                echo "$data <br>";              
            }
        }
        if($flag)
            echo "<h1 style='color:limegreen'> All records imported successfully ! </h1>";
        else
            echo " Error while fetching one or more records";

        fclose($csvFile);
    }

?>

<form method="post" action="importExcel.php" enctype="multipart/form-data">
    <input type="file" name="csv_file" id="csv_file" accept=".xlsx" >
    <input type="submit" name="upload" id="upload" >
</form>

This is the excel sheet with the Japanese characters: enter image description here

My question is :

  • How do i make those Japanese characters to display properly on web browser? I tried using the utf_decode() function. It did not help.
  • Also if i want to store these Japanese alphabets in MySQL database, what changes will i be required to make?

Currently the browser displays the Japanese characters as question marks and some garbage values after using utf_decode() function. When i remove it, it displays junk values on screen.

Edit: Here is the sample data from the excel file:

アイリッシュ・セッター アイリッシュ・ウォーター・スパニエル アイリッシュ・ウルフハウンド

FreeKrishna
  • 1,833
  • 2
  • 12
  • 21
  • 3
    Possible duplicate of [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – Qirel Feb 02 '17 at 10:07
  • FYI: these are Japanese characters, not Chinese. I've edited your description. – epo3 Feb 02 '17 at 10:12
  • @epo3 Thanks. I mistook it for Chinese because the file title is in Chinese :) – FreeKrishna Feb 02 '17 at 10:13
  • 1
    utf8 should be the goal, but one important question is missing so far: do you know what encoding is used in the excel/csv file? utf_encode/decode only deal with latin_1 <-> utf8. If there are other encodings involved, you need iconv() or mb_convert_encoding() – cypherabe Feb 02 '17 at 10:15
  • @cypherabe I'm not sure what encoding format the Excel file is in. I've been trying to save it in UTF-8 , but couldn't find the option in MS Excel 2007. – FreeKrishna Feb 02 '17 at 10:18
  • maybe try to convert the file in openoffice or libreoffice. If I remember correctly these programms offer more options when converting a file to CSV. – cypherabe Feb 02 '17 at 10:29
  • @cypherabe I found this method to save Excel file as UTF from here : https://help.surveygizmo.com/help/encode-an-excel-file-to-utf-8-or-utf-16 from one of the answers below. But it still doesn't seems to work. For regular UTF-8 text file its working , but not the excel file. :( – FreeKrishna Feb 02 '17 at 10:33

2 Answers2

0

You don't need utf8_decode, adding header("Content-Type: text/html; charset=UTF-8"); will work for you. like->

NOTE: utf8_decode This function converts the string data from the UTF-8 encoding to ISO-8859-1. but you need utf-8 in your case.

NOTE: using fgetcsv may not give the desired data when reading excel file . convert it in csv or use some excel reader library.

header("Content-Type: text/html; charset=UTF-8");  
$str="七起千去";
echo "$data <br>";

NOTE: if you are not using header add <meta charset="utf-8"> to your page this should also work.

Suchit kumar
  • 11,809
  • 3
  • 22
  • 44
  • I've already have added header in the page, and also have tried the meta tag. Still doesn't work. :( – FreeKrishna Feb 02 '17 at 10:15
  • @FreeKrishna it is working fine for me.can you share the data or post one or two rows in question. first copy paste this code and try in some other file then see. – Suchit kumar Feb 02 '17 at 10:17
  • This code works in an individual file. My problem is i'm fetching Japanese content from another file. And that doesn't seems to be working. – FreeKrishna Feb 02 '17 at 10:20
  • @FreeKrishna see in file property if there is any encoding. see how to change encoding:https://help.surveygizmo.com/help/encode-an-excel-file-to-utf-8-or-utf-16 – Suchit kumar Feb 02 '17 at 10:21
  • i've tried saving the file as unicode txt file and in that it works. Also tried your method to save excel as UTF-8 , but its still showing same problem. – FreeKrishna Feb 02 '17 at 10:31
  • it still isn't solved. Its only working for text file , but not excel :( – FreeKrishna Feb 02 '17 at 10:39
  • does it matter with .csv ? i'm trying to work with excel as of now. – FreeKrishna Feb 02 '17 at 10:49
  • @FreeKrishna i don't know much but it may work try once if possible.It is working for me.i saved the csv file in utf8 encoding. – Suchit kumar Feb 02 '17 at 10:53
  • oh, umm...maybe another hurdle: fgetcsv() won't work with a file in excel format (.xls, .xlsx). You may get some result, but not the intended text. Most php functions assume text strings as input, but excel and simular programms wrap the text content in a lot of formating, information for formulars, userrights etc and may even compress the file. you would need a helper library to read an excel file first – cypherabe Feb 02 '17 at 11:37
  • @FreeKrishna you can try `SpreadsheetReader`. I have used it it's good. – Suchit kumar Feb 02 '17 at 11:40
0

Do not use any encoders/decoders, that will only make things worse.

Do have CHARACTER SET utf8 (or utf8mb4)

Do have the header or meta, as already discussed.

Do set the connection between php and mysql to be UTF-8. (Let's see that part of the code to check.)

Do specify UTF-8 in <form method="post" ... accept-charset="UTF-8">

Please show us what went wrong. There are at least 5 variants of "not working", as can be seen here: Trouble with utf8 characters; what I see is not what I stored

If you can get the HEX, note that utf8 encoding for アイリッ is E382A2 E382A4 E383AA E38383. Other Katakana characters are also E3xxyy. (Kanji and Hiragana are also 3 bytes: Ewxxyy.)

Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The problem i faced was when i was fetching the Japanese characters from excel file, browser displayed garbage values. Then i tried encoder , which ended up displaying questions marks. So i finally had to use the csv file format, which worked. Now while storing the data i've set the connection as mysqli_set_charset($conn, 'utf8mb4'); , all other stuff you've mentioned has been set as well but still i'm getting an insert error. – FreeKrishna Feb 06 '17 at 05:01
  • I think i fixed it , the problem was i had changed the table format to utf8mb4_unicode_520_ci , but the column format was unchanged. So i changed it back to utf8mb4_unicode_520_ci. Now i can see the Japanese characters in database as displayed on the browser or the file. – FreeKrishna Feb 06 '17 at 05:29