1

Based on the answer provided here by user user1830391: Some characters in CSV file are not read during PHP fgetcsv()

I updated my following code to use fgets() instead of fgetcsv(). It fixed my first character issue. thats no longer a prob... but...

what if the .csv file is seprated using ; instead of , Some fields will be wrapped using double quotes "", for example one of my rows is split onto 2 lines. quote opened in the last element of one line and closed at the end of the first element of the next line. There is an "enter"(/n) in that cell. how should i treat this using this code. fgetcsv catches elements within double quotes but i dont think fgets() does.

function runCSVtoArray() {
    // --> FOR IMPORT
    //Function that converts a CSV file to a PHP array.
    //echo '<span class="success">Chargement du fichier CSV pour importation MYSQL....</span><br />';
    $readCharsPerLine = (JRequest::getVar('charsPerLine') > 0) ? JRequest::getVar('charsPerLine') : 1500; /* Import as of 2012-04-16 seem to have max 800chars per line. 1500 is alot of extra. */
    ini_set("auto_detect_line_endings", true);
    iconv_set_encoding("internal_encoding", "UTF-8");
    $openfile = $this->imp['importPath'].$this->imp['csvFileName'];
    if ( file_exists($openfile) ) {
        //echo '<span class="success">Fichier CSV trouvé....</span><br />';
        //echo '<span class="success">Ouverture du fichier : '.$openfile.'</span><br />';
        if (($handle = fopen($openfile, "r")) !== FALSE) {
            //echo '<span class="success">Fichier CSV ouvert... Chargement en cours....</span><br />';
            $row_i=0;
            $this->_importData = array();
            /*while (($data = fgetcsv($handle, $readCharsPerLine, ";")) !== FALSE) {*/
            while (($the_line = fgets($handle)) !== FALSE) {
                $data = explode(';', $the_line);
                $debugoutput = implode('; ', $data).'<br />'; echo ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($debugoutput, "UTF-8") == "UTF-8") ) ? utf8_encode($debugoutput) : $debugoutput.'<br />'; //Debug2
                /*
                $num        = count($data);
                if ($row_i==0) {
                    // TITLE ROW
                    $keyRow = array();
                    for ($c=0; $c < $num; $c++) {
                        //Making title array with CSV first line
                        //Key for colum
                        if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) { $data[$c] = utf8_encode($data[$c]); }
                        if ($data[$c]!="") {
                            $keyRow[$c]=trim($data[$c]);
                            $keyRow[$c]=str_replace('GDWACCENT', '', $keyRow[$c]);  //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
                        }
                        else { $keyRow[$c]=''; }
                    }
                } else {
                    //VALUE ROW...
                    for ($c=0; $c < $num; $c++) {
                        $key = $keyRow[$c];
                        if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) {
                            $data[$c] = utf8_encode($data[$c]);
                            $data[$c]=str_replace('GDWACCENT', '', $data[$c]);  //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
                        }
                        if ($data[$c]!="") {
                            $this->_importData[$row_i][$key]=trim($data[$c]);
                            $this->_importData[$row_i][$key]=str_replace('GDWACCENT', '', $this->_importData[$row_i][$key]);    //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
                        }
                    }
                }
                */
                $row_i++;
            } //End while()
            //echo '<span class="success">Chargement terminer.... Sauvegarde en cours...</span><br />';
            return true;
        } else {
            //Incapable d'ouvrir le fichier d'importation.
            return false;
        }
    } else {
        //FILE NOT FOUND...
        return false;
    }
} // runCSVtoArray()
Community
  • 1
  • 1
StiGMaT
  • 760
  • 9
  • 17
  • I don't know what to try... because the number of columns in the Excel file can change... its not a determined amount.. i though i could like set a number of ";" to be found per line. if the line ends before the limit of ";" is reached i could get the next line and stay on the same array key. but there could be 3 columns like there could be 10.. so that wont work... I really don't know what else... – StiGMaT Feb 12 '13 at 16:13
  • HMm I just though of this... the title row contains a simple db field name... it should alway's be on one line 1 word with no spaces... I could use it to determine how many ";" there are.. i think i don't have a choice I'm, gona try that. – StiGMaT Feb 12 '13 at 16:19

3 Answers3

4

The answer you are relying on, that says fgetcsv works only with ascii chars, is simply wrong. True is :

Note:

Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in >one-byte encoding are read wrong by this function.

So you'll have to configure your LANG variable instead of using fgets.

Here comes an example how to set the lang variable:

putenv("LANG=fr_FR.UTF-8");
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • I tried using them: setlocale(LC_ALL, 'fr_FR.UTF-8'); setlocale(LC_ALL, 'fr_CA.UTF-8'); setlocale(LC_ALL, 'en_US.UTF-8'); putting them at the top of my function, then tryied first line of the file, first line of the site... never made any diffrence. ive tested the default setlocale(LC_ALL, 0) and that returns C. if i set it to one of them (fr_FR.UTF-8, fr_CA.UTF-8 or en_US.UTF-8) setlocale returns the value that I set. I also tested just trying to set any random string to see an invalide input result, but that just returns C. so it seems to be setting it right. but it doesnt make a diffrence. – StiGMaT Feb 12 '13 at 16:30
  • i'm looking all over the place for LANG... every where its telling me something like how would you set it?? – StiGMaT Feb 12 '13 at 16:43
  • Check my answer. I've explained it – hek2mgl Feb 12 '13 at 16:49
  • 2
    Just tried 3 different settings: putenv("LANG=fr_FR.UTF-8");, putenv("LANG=fr_CA.UTF-8"); and putenv("LANG=en_US.UTF-8"); Still no luck. first character ignored if its a special character like É é – StiGMaT Feb 12 '13 at 16:49
  • Do you mean the first charcter in file? or in word? – hek2mgl Feb 12 '13 at 16:49
  • first character of a cell. – StiGMaT Feb 12 '13 at 16:51
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/24390/discussion-between-hek2mgl-and-stigmat) – hek2mgl Feb 12 '13 at 16:52
3

I solved this by opening the file using fopen and fgets instead of fgetcsv() and writing a copy using utf8_encode for each line. Then i use the copy and put that through fgetcsv()

here is my updated code.

function runCSVtoArray() {
    // --> FOR IMPORT
    //Function that converts a CSV file to a PHP array.
    //echo '<span class="success">Chargement du fichier CSV pour importation MYSQL....</span><br />';
    $readCharsPerLine = (JRequest::getVar('charsPerLine') > 0) ? JRequest::getVar('charsPerLine') : 1500; /* Import as of 2012-04-16 seem to have max 800chars per line. 1500 is alot of extra. */
    putenv("LANG=fr_CA.UTF-8");
    setlocale(LC_ALL, 'fr_CA.UTF-8');
    //ini_set("auto_detect_line_endings", true);
    //iconv_set_encoding("internal_encoding", "UTF-8");
    $openfile = $this->imp['importPath'].$this->imp['csvFileName'];
    $utf8File = str_replace('.csv', '_utf8.csv', $openfile);

    if ( file_exists($openfile) ) {
        //echo '<span class="success">Fichier CSV trouvé....</span><br />';

        //rewrite the file in UTF8
        if (JRequest::getVar('encodeutf8')) {
            if (($handle = fopen($openfile, "r")) !== FALSE) {
                $newFileHandle = fopen($utf8File, 'w');     //NEW UTF8 FORMAT
                //fwrite($newFileHandle, "\xEF\xBB\xBF");
                while (($the_line = fgets($handle)) !== FALSE) {
                    fwrite($newFileHandle, utf8_encode($the_line));
                }   //End of while()
            }
            $openfile = $utf8File;
        }

        //echo '<span class="success">Ouverture du fichier : '.$openfile.'</span><br />';
        if (($handle = fopen($openfile, "r")) !== FALSE) {
            //echo '<span class="success">Fichier CSV ouvert... Chargement en cours....</span><br />';
            $row_i=0;
            $this->_importData = array();
            while (($data = fgetcsv($handle, $readCharsPerLine, ";")) !== FALSE) {
            /*while (($the_line = fgets($handle)) !== FALSE) {*/
                //$data = explode(';', $the_line);
                //$debugoutput = implode('; ', $data); echo ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($debugoutput, "UTF-8") == "UTF-8") ) ? utf8_encode($debugoutput).'<br />' : $debugoutput.'<br />';   //Debug2
                //$debugoutput = implode('; ', $data); echo $debugoutput.'<br />';  //Debug2
                $num            = count($data);
                if ($row_i==0) {
                    // TITLE ROW
                    $keyRow = array();
                    $maxItems = count($data);   //Count the number of ";"
                    for ($c=0; $c < $num; $c++) {
                        //Making title array with CSV first line
                        //Key for colum
                        if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) {
                            //$data[$c] = utf8_encode($data[$c]);
                            $data[$c] = $data[$c];
                        }
                        if ($data[$c]!="") {
                            $keyRow[$c]=trim($data[$c]);
                            $keyRow[$c]=str_replace('GDWACCENT', '', $keyRow[$c]);  //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
                        }
                        else { $keyRow[$c]=''; }
                    }
                } else {
                    //VALUE ROW...
                    for ($c=0; $c < $num; $c++) {
                        $key = $keyRow[$c];
                        if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) {
                            //$data[$c] = utf8_encode($data[$c]);
                            $data[$c] = $data[$c];
                            $data[$c]=str_replace('GDWACCENT', '', $data[$c]);  //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
                        }
                        if ($data[$c]!="") {
                            $this->_importData[$row_i][$key]=trim($data[$c]);
                            $this->_importData[$row_i][$key]=str_replace('GDWACCENT', '', $this->_importData[$row_i][$key]);    //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
                        }
                    }   //End of for()
                }
                $row_i++;
            } //End while()
            //echo 'HERE<br />';
            //gdwprint($this->_importData);
            //exit();
            //echo '<span class="success">Chargement terminer.... Sauvegarde en cours...</span><br />';
            return true;
        } else {
            //Incapable d'ouvrir le fichier d'importation.
            return false;
        }
    } else {
        //FILE NOT FOUND...
        return false;
    }
} // runCSVtoArray()
Community
  • 1
  • 1
StiGMaT
  • 760
  • 9
  • 17
  • This turned out to be the solution in my case too. PHP was unable to properly detect the charset of the incoming csv file and just started mangling things without warning. Re-encoding lines worked perfectly. – Ryaner Jan 15 '14 at 15:40
1

From my experience the input data for fgetcsv() must be in UTF-8.

In your case if you have É ignored in Éric then your input is not UTF-8 but probably some single byte encoding instead (Windows-1252? echo bin2hex($str); to verify). There is a bugreport in php bug tracker (https://bugs.php.net/bug.php?id=55507). Solution is to convert text to utf8 before feeding to fgetcsv

Also it is importat for the UTF-8 not to contain BOM.

Imre L
  • 6,159
  • 24
  • 32