0

I have a PHP code that allows me to read a csv file, insert the data into the database and move this file to another folder once the processing is finished.

This code works by default with UTF8 BOM files, I added the line fseek($handle, 3); to pass the first 3 characters.

I would like to know how I can execute the same code for UTF8 files by integrating the lines directly or in UTF8 BOM by starting after the first 3 characters?

<?php

include("connexion.php");

$dir   = '//server/d$/ftp/GET/';
$allFiles = scandir($dir);
$dest = '//server/d$/ftp/GET/COPIES/';

foreach($allFiles as $file) {

    if (!in_array($file,array(".","..")))
    { 
        $file = $dir.$file;
        $filename = basename( $file );
        
        if ( strpos( $filename, 'BI1_' ) === 0 ) 
        {
            
            if (($handle = fopen($file, "r")) !== false) 
            {
                 
                //To remove BOM in the first cell
                 fseek($handle, 3);   
                  
                     $bi1_values =  array();
                     while (($data = fgetcsv($handle, 9000000, ";")) !== false) 
                        {                                         
                                $bi1_values[] = "('$data[0]', '".str_replace("'", "''",$data[1])."','$date1','$date2','$data[2]','$data[4]','".str_replace("'", "''",$data[5])."','".str_replace("'", "''",$data[6])."')";                 
                                if (count($bi1_values) == 1000) 
                                { 
                                    $query = "insert into dbo.Sales (storenumber, storename, date, time, TransRef, stylecode, color, size) 
                                    values " . implode(',', $bi1_values);
                                    $stmt = $conn->query( $query );
                                    
                                    if (!$stmt) 
                                    { 
                                            $file1 = "D:/xampp/htdocs/errors/erreur_BI1.txt";                       
                                            file_put_contents($file1, $query . PHP_EOL, FILE_APPEND | LOCK_EX);
                                    }   
                                    $bi1_values = array();
                                } 
                        }
                        
                    fclose($handle);
                      
                    //Moving the file to another folder             
                    if(!rename($file, $dest . $filename)) 
                    { 
                        echo "error";
                    }                
                }
            }
    }
}

?>
Eric27
  • 89
  • 1
  • 7

1 Answers1

1

Leave the file as is and remove the BOM characters from the $data array. So you can process both files with BOM and without BOM. Roughly:

$firstRow = true;
while (($data = fgetcsv($handle, 9000000, ";")) !== false) {
  if($firstRow) {
    $data[0] = str_replace("\xef\xbb\xbf","",$data[0]);
    $firstrow = false;
  }
  //..

"\xef\xbb\xbf" is the string notation for a BOM.

jspit
  • 7,276
  • 1
  • 9
  • 17
  • So I put the if condition and then I put the rest of my code with $bi1_values[] etc? – Eric27 Jun 29 '22 at 10:34
  • @jspit did you mean to close this page with https://stackoverflow.com/q/10290849/2943403 instead of answering? – mickmackusa Jun 29 '22 at 10:42
  • 1
    Yes and delete th line with fseek($handle, 3). There are numerous answers on SO on the subject. For example: https://stackoverflow.com/questions/10290849/how-to-remove-multiple-utf-8-bom-sequences – jspit Jun 29 '22 at 10:45
  • @mickmackusa The existing solutions remove a BOM sequence from the entire file and not just from the first cell. I think that's not very clean. To make it 100%, a preg_replace would also have to go here. – jspit Jun 29 '22 at 10:57
  • Stack Overflow has many duplicates to choose from, I'm sure. Please take your pick. – mickmackusa Jun 29 '22 at 11:01
  • This is only for a UTF-8 BOM, but [other BOMs](https://en.wikipedia.org/wiki/Byte_order_mark#Byte_order_marks_by_encoding) exist, too. – AmigoJack Jul 10 '22 at 10:54
  • For non-UTF-8 BOMs, the character sets must also be converted to UTF-8. – jspit Jul 11 '22 at 08:56