0

I have a large csv file (over 100,000 lines) encoded in UTF-8-BOM that looks like this:

27336;00203-AND1;90-BLACK;9.5;2
27336;00203-ET1;90-BLACK;10;1
27336;00203-ET1;90-BLACK;12;1
...

And a table in my SQL Server database with these columns :

storenumber | stylecode | colour | size | units | timestamp

I use Bulk Insert Data to load the file at once, but I would like to add my $timestamp variable to each line inserted in my table but it doesn't work... How do I do it?

 <?php

include("connexion.php");

ini_set('max_execution_time', 32400);
$timestamp= date("y-m-d H:i");


$csv= "D:/xampp/htdocs/retail_BI/files/BI2_20200720_1344_00076.txt";

                     $query = "BULK INSERT dbo.Y2_Inventory 
                    FROM '$csv' 
                    WITH (
                    FIELDTERMINATOR = ';', 
                    ROWTERMINATOR = '\n',
                    ERRORFILE = 'myfileerror.log'
                     )";

                     $stmt = $conn->query( $query );     
                     if (!$stmt) { echo $conn->error;} 


$query2 = "UPDATE dbo.Y2_Inventory SET timestamp = ? WHERE timestamp IS NULL";

                     $stmt = $conn->query( $query2 );     
                    
            
        echo "good";        

?>
Zhorov
  • 28,486
  • 6
  • 27
  • 52
  • You can't parametrise a `BULK INSERT` you have to use dynamic SQL and **safely** inject the values (something you are not doing here). – Thom A Jul 24 '20 at 09:29
  • I wanted to use BULK INSERT because otherwise the processing of my file takes hours... can I improve the insertion speed? @Larnu –  Jul 24 '20 at 09:31
  • @Eric27 What do you expect from `LASTROW = '$timestamp'`? `LASTROW` simply specifies the number of the last row to load. You may try to execute and additional `UPDATE` after the `BULK INSERT`. – Zhorov Jul 24 '20 at 09:42
  • But if I do it afterwards, won't he insert my $timestamp in every line of my table? @Zhorov –  Jul 24 '20 at 09:46
  • I assume that on inserting the `timestamp` column is `NULL`, so `UPDATE .. SET timestamp = ? WHERE timestamp IS NULL` should wortk. – Zhorov Jul 24 '20 at 09:51
  • I updated my code, do you think I should do it like this? @Zhorov –  Jul 24 '20 at 10:04
  • Can you succesfully execute `BULK INSERT` command in an external tool (like SQL Server Management Studio), because your table has `6` columns, but you have only `5` columns in the data file? if not you'll need a format file with mappings. – Zhorov Jul 24 '20 at 11:41
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/218517/discussion-between-eric27-and-zhorov). –  Jul 24 '20 at 12:15

2 Answers2

1

You need to consider the following:

  • Always try to use parameters in your statement (of course if possible) or carefully sanitize the input data. In this specific case you may try to check if the input file exists and after that inject filename in your statement.
  • The input data doesn't match the table definition, so you may try to import the data in a temporary table.
  • Pass datetime values as text using an unambiguous format (e.g. 2020-07-25T12:00:00).

The following example is a possible solution to your problem:

Table:

CREATE TABLE Y2_Inventory (
    storenumber int,
    stylecode nvarchar(50),
    colour nvarchar(50),
    size numeric(10, 1), 
    units int,
    [timestamp] datetime
)

PHP script:

<?php
//
include("connexion.php");
ini_set('max_execution_time', 32400);

// CSV file
$csv = "D:/xampp/htdocs/retail_BI/files/BI2_20200720_1344_00076.txt";  
$err = "D:/xampp/htdocs/retail_BI/files/BI2_20200720_1344_00076.err";
if (!file_exists($csv)) {
    die("CSV file not exists.");    
}
    
// Time stamp
$timestamp = date("Y-m-d\TH:i:s");

// INSERT Data
try {
    $query = "
        SET NOCOUNT ON;
        
        SELECT storenumber, stylecode, colour, size, units
        INTO #t
        FROM Y2_Inventory
        WHERE 1 = 0;
        
        BULK INSERT #t 
        FROM '$csv' 
        WITH (
            ERRORFILE = '$err',
            FIELDTERMINATOR = ';', 
            ROWTERMINATOR = '\n'
        );
        
        INSERT INTO Y2_Inventory (storenumber, stylecode, colour, size, units, [timestamp])
        SELECT storenumber, stylecode, colour, size, units, ?
        FROM #t;
    
        DROP TABLE #t;
    ";
    $stmt = $conn->prepare($query);     
    $stmt->bindParam(1, $timestamp, PDO::PARAM_STR);
    $stmt->execute();
    echo "OK";        
} catch (PDOException $e) {
    die ("Error executing query. ".$e->getMessage());
}
?>
Zhorov
  • 28,486
  • 6
  • 27
  • 52
  • 1
    @Eric27 It is a working example, tested with SQL Server 2017, PHP 7.4.8 and PHP Driver for SQL Server 5.8. – Zhorov Jul 24 '20 at 12:51
  • Have you tested with a csv file encoded in UTF-8-BOM?Because my files are encoded like this and it was causing me problems I had to do a 'fseek($handle, 3);' to ignore the BOM when reading my file when I was using fgetcsv @Zhorov –  Jul 24 '20 at 12:55
  • I had used it to insert my csv data into my table but it took me several hours to record all the lines... That's why I looked for a solution to go faster and I think it's the best to insert data from big csv files? @Zhorov –  Jul 24 '20 at 21:07
  • Do you think you could show me an example with the 1000 rows? I'd like to compare the two methods to see which one takes the least time for large csv files @Zhorov –  Jul 27 '20 at 10:20
  • Thank you for your answer, I found this : https://stackoverflow.com/questions/1176352/pdo-prepared-inserts-multiple-rows-in-single-query but I'm having trouble adapting it to my situation... @Zhorov –  Jul 27 '20 at 10:38
  • 1
    @Eric27 It's an exellent starting point. If you have troubles implementing this solution, you may ask a question with the problems you face. I'm sure most of us will try to help you. – Zhorov Jul 27 '20 at 10:40
  • And if I want to put in a txt file the erroneous lines that are not inserted in the table, how can I do it? @Zhorov –  Jul 27 '20 at 17:33
  • 1
    @Eric27 `ERRORFILE` option in `BULK INSERT`. – Zhorov Jul 27 '20 at 17:46
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/218711/discussion-between-eric27-and-zhorov). –  Jul 27 '20 at 19:14
  • When I use your code the file doesn't fit into my database... It's very strange @Zhorov –  Jul 28 '20 at 08:22
  • No on the page I did display "OK" during execution @Zhorov –  Jul 28 '20 at 08:42
  • Does the table definition from the example in the answer match your actual table definition? If possible, post the table definition in the question. Also, I've made an update and the `BULK INRT` includes `ERRORFILE` option. Does this file contain any errors? – Zhorov Jul 28 '20 at 08:50
  • I used your table definition and your code to make sure I have the same thing and the error file will not be created in the folder in question @Zhorov –  Jul 28 '20 at 08:56
  • @Eric27 I've tested and double-tested the example. If the `CSV` has data with formatting errors and cannot be converted, the error file is created and the excpetion is generated, but the correct data is inserted into the table. Change the file location and use a folder with the appropriate permissions. Thanks. – Zhorov Jul 28 '20 at 09:20
  • Where did you put your file so that your SQL Server can access it? @Zhorov –  Jul 28 '20 at 09:44
  • I found my mistake, when I test directly into the database I have: Cannot bulk load because the file ... could not be opened. Operating system error code (null) –  Jul 28 '20 at 09:48
  • @Eric27, For the test in the`D:\\` folder. – Zhorov Jul 28 '20 at 10:26
  • 1
    Ok the error comes from there because I have my files which are on my web server and not on my SQL Server @Zhorov –  Jul 28 '20 at 10:28
  • Wondering what `INSERT INTO WHERE 1 = 0` means ... Didn't know it a nice way to create a new table from another table schema ;) – Hrvoje T Dec 23 '21 at 01:04
  • 1
    @HrvojeT, it's simply a `WHERE` clause, that returns `FALSE`, so the inserted rows are always zero. – Zhorov Dec 23 '21 at 06:46
0

Like I mention in the comment, you can't parametrise a BULK INSERT statement. Therefore you have to use Dynamic SQL. I don't know/write PHP, however, the above I can clearly see is wide open to injection, as you're simply injecting the file's name into the SQL statement. You need to fix that, and parametrise your statement How to: Perform Parameterized Queries.

As for the SQL, this will look something like this:

DECLARE @FilePath nvarchar(256); --This would be your parameter, so might not bneed a declaration

DECLARE @SQL nvarchar(MAX);
SET @SQL = N'BULK INSERT dbo.Y2_Inventory 
FROM N''' + REPLACE(@FilePath,'''','''''') + N'''
     WITH(FIELDTERMINATOR = '';'',
          FIELDTERMINATOR = ''\n'',
          ERRORFILE = ''myfileerror.log'');';

--PRINT @SQL; --Your Best Friend

EXEC sys.sp_executesql @SQL;
Thom A
  • 88,727
  • 11
  • 45
  • 75
  • Thanks for your answer, so now I have to adapt this sql code in php to execute it in my code? @Larnu –  Jul 24 '20 at 10:19
  • Correct, as I said, I don't write PHP, so I have omitted that, however, I have linked you on the documentation on how to do it. – Thom A Jul 24 '20 at 10:28
  • Your code corrects sql injections but not to add my last $timestamp column at the same time? @Larnu –  Jul 24 '20 at 12:17
  • That's a completely different query, @Eric27 . I was addressing the `BULK INSERT`, as that's the part you're having the problem with,. – Thom A Jul 24 '20 at 12:24