5

The following is a part of a bigger PHP script which adds or disables users from a MySQL database using a CSV file and a client id field as input.

There are two tables involved, users and users_clients. The later keeps the relationship between users and clients as an user can belong to multiple clients.


This is the structure of the tables

users structure (it has more fields)

id        | int(11) (primary key)
user      | varchar(100)
pass      | varchar(100)
category  | int(11)
date      | timestamp
name      | varchar(100)
email     | varchar(255)

users indexes

SEARCH    | user        | FULLTEXT
SEARCH    | name        | FULLTEXT
SEARCH    | email       | FULLTEXT

users_clients structure

id_user   | int(11)
id_client | int(11)
status    | enum('active','inactive')


This is the basic flow of the script for adding each user from the CSV file:

  1. Check if the user exists for that client.

    SELECT 
        LOWER(user)
    FROM
        users u
        INNER JOIN users_clients uc ON u.id = uc.id_user
    WHERE
        u.user = '$user'
        and uc.id_client = $id_client
    
  2. If it doesn't exist, add it to the database.

    INSERT INTO 
        users ($fields,id_client)
    VALUES 
        ($values,'$id_operation')
    
  3. Get the id of the inserted user. I know I could use something like mysql_insert_id here, but what about the race conditions?.

    SELECT
        u.id as id
    FROM
        users u
    WHERE
        u.user = '$user'
        and u.id_client = '$id_operation'
    
  4. Associate the user with the corresponding client.

    INSERT INTO
        users_clients (id_user, id_client) 
    VALUES
        ('$id_user','$id_client')
    

There are currently 400.000 users in the table. The script takes 10+ minutes to process a CVS with 500 users.

How would you improve this so that it is faster?

Thanks in advance.

PD: If you want to see the complete function, it's available at pastebin.

Lando
  • 715
  • 6
  • 29
  • you should create a store procedure that does all the work. – Nir Alfasi Jun 13 '12 at 18:23
  • 1
    http://codereview.stackexchange.com/ – j08691 Jun 13 '12 at 18:24
  • 2
    You should not have a race condition with mysql_insert_id. Just give it the connection you used for the insert. – Eric Hogue Jun 13 '12 at 18:25
  • by the way, you can also speed up this thing with mySQL config improvements (google for "mysql primer"). i dont know your dimensions, but in case your are doing this 10hrs per day, there might be a significant performance boost – Sliq Jun 13 '12 at 18:37
  • @ceejayoz: updated question to include table structure and indexes. Users has a FULLTEXT index that includes user, name and email fields. – Lando Jun 13 '12 at 19:05
  • @Panique: looks interesting, I will check it out. Performance is not critical for this script as it's used just a few times per day, but it can't hurt to save some time and learn to write better code :). – Lando Jun 13 '12 at 19:27

2 Answers2

6
INSERT INTO table (id,a,b,c) VALUES (5454,1,2,3)
ON DUPLICATE KEY
UPDATE table SET foo WHERE id=xyz;
  1. Set indexes in the DB
  2. use mysqli instead of mysql
  3. collect all the stuff you want to insert and do it with a prepared statement / stored procedure like here How to insert an array into a single MySQL Prepared statement w/ PHP and PDO
  4. don't do 500 SELECTs, simple get the entire database and work through it via a foreach/while loop, checking for the stuff you need
  5. use a construct like above

Important: For the above statement the column id should have an unique index !!!

Community
  • 1
  • 1
Sliq
  • 15,937
  • 27
  • 110
  • 143
  • I don't think that mysqli is any faster than mysql, except for the use of prepared statements, which is obvious from point 3. – Mike Jun 13 '12 at 18:33
  • @Mike I'm not sure about that, but the official php docs say "use mysqli because it's faster" or something like that. well, we simply don't know that ;) ... maybe somebody can bring up some benchmarks here ? – Sliq Jun 13 '12 at 18:34
  • 1
    Fortunately, somebody already has http://stackoverflow.com/questions/171400/which-is-fastest-in-php-mysql-or-mysqli – Mike Jun 13 '12 at 18:41
  • #4 - Don't use a `foreach` every time, get your list of usernames into an array then test with [`in_array`](http://php.net/manual/en/function.in-array.php) - more efficient – Basic Jun 13 '12 at 18:57
  • @Panique: #1 Just updated the question with table structure and indexes. Should I add additional indexes? – Lando Jun 13 '12 at 20:03
2

Wrap INSERTs into transaction and don't worry, mysql_insert_id() is completely safe unless you switch to another database connection.

It is also possible to wrap all your queries to a transaction in result of massive speed improvement.

jkrcma
  • 1,180
  • 9
  • 14