2

I've been trying to import a csv file into a mysql database using LOAD DATA INFILE.

Everything is working more or less correctly, but when I use mysqli_info() or mysqli_affected_rows() they each show that no rows have been imported for the query. Even though I see the rows are being correctly imported.

A simplified version of what I am trying to do (Fewer columns that I am actually importing):

$server = 'localhost';
$username = 'root';
$password = 'password123';
$database = 'database_name';

$connect = new mysqli($server, $username, $password, $database);

$files = scandir('imports/');

foreach($files as $file) {
   $import = 
        "LOAD DATA INFILE 'imports/$file'
        IGNORE INTO TABLE table_name
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
        LINES TERMINATED BY '\n'
        IGNORE 1 LINES
        (@id, @name, @address, @product)
        set id=@id, name=@name, address=@address, product=@product";

    if(! $connect->query($import)) {
        echo 'Failed';
    }

    $connect->query($import);

    echo mysqli_affected_rows($connect);
}

mysqli_affected_rows() returns 0, while mysqli_info() states all rows have been skipped. Any idea why it's not displaying correctly?

Hopefully that's enough information. Thanks!

Edit:

I've been a bit too busy to work on this over the past few days, but I have decided that although it doesn't specifically answer my question Drew's answer of importing into a temporary table first seems to make the most sense so I have decided to go with that.

Edward144
  • 493
  • 6
  • 28

1 Answers1

3

Further clarification of my comment: I would not be relying on $connect->affected_rows as ever a Rosetta stone for info. It is broken half the time.

This is one recommendation. Perform your LOAD DATA INFILE into a worktable, not your final desired table. Once that is performed, you have successfully escaped the limitations of that functionality and can enjoy the fruits of Insert on Duplicate Key Update (IODKU).

With the latter when I want to know counts, I get an assigned batch number from a control table. If you need help with that, let me know.

But I am stepping back to the point of having the data in the worktable now, with a batch number in hand. I then perform the IODKU from the worktable into the final table with the batch number coming along for the ride into a column in the final table (yes, I can tweak the schema to do that, perhaps you cannot). To get around any schema changes to existing table, and to find cases or allow for a row having multiple batch id's having hit it, a simple association table with id's can be used. An intersect table if you will.

If concurrency is an issue (multiple users likely able to do this concurrently), then a locking strategy (ideally INNODB row-level locking) is employed. If used, make it fast.

I then fetch my count off the final table (or intersect table) where batch id = my batch number in hand.

See also this answer from Jan.

Community
  • 1
  • 1
Drew
  • 24,851
  • 10
  • 43
  • 78
  • Although this does not strictly answer the question (why affected_rows() does not display the number accurately), this is definitely the approach I would recommend. – Shadow Jul 07 '16 at 15:32
  • How about the part that says it is broken half the time. There are plenty of bug reports even from minor version upgrades. It is a mess. And the likes of PDO rowCount had to take other measures to keep some semblance of sanity. – Drew Jul 07 '16 at 15:33
  • I would be careful with that statement. Many bug reports are regarding mysql_affected_rows() API function are complaints about not supporting certain statements (such as load data), returning various results when encountering an error in its earlier versions (this was a documented behaviour). When raised in conjunction with a php api library, the issue may get even more complicated, since the version of the MySQL server may be different from the version of the C API against which the php api library was compiled. – Shadow Jul 07 '16 at 15:56
  • I think the history and experience with [the C routine](http://dev.mysql.com/doc/refman/5.7/en/mysql-affected-rows.html) speaks for itself :p ... if things were so well documented and so less error prone we wouldn't spend a decent amount of time dealing with it all. – Drew Jul 07 '16 at 16:01