2

There is a POST request of a collection of user ids being passed into a PHP script. This collection is being compared with existing ids in 'user_table'.

This is achieved by joining an array of the ids into a string, and the resulting string of comma-separated ids is used directly in the query.

I fear that this will not scale well, and potentially be a disaster! Can anyone share some wisdom as to the conventional approach to this kind of problem?

$id_json_array = postVar('u');
$user_ids = json_decode($id_json_array);
$str_ids = join(',', $user_ids);

$result = mysql_query(
   "SELECT  u.user_id AS i
          , u.user_name AS u
    FROM    user_table u
    WHERE   u.user_id IN ($str_ids)"
) or die (mysql_error ());

The ids array would be perhaps a couple thousand, and the 'user_table' is large (tens or hundreds of thousands, indexed, etc.)

Thanks in advance.

pjama
  • 3,014
  • 3
  • 26
  • 27
  • 1
    By far your *biggest* problem is the risk of SQL injection! You *really* should be using [prepared statements](http://stackoverflow.com/a/60496/623041), into which you pass your variables as parameters that do not get evaluated for SQL. If you don't know what I'm talking about, or how to fix it, read the story of [Bobby Tables](http://stackoverflow.com/questions/332365/xkcd-sql-injection-please-explain). – eggyal Jun 10 '12 at 08:53
  • Also, please stop writing new code with the ancient MySQL extension: it is no longer maintained and the community has begun the [deprecation process](http://news.php.net/php.internals/53799); you can use instead either the improved [MySQLi](http://php.net/mysqli) extension or the [PDO](http://php.net/pdo) abstraction layer. – eggyal Jun 10 '12 at 08:53
  • Although the input is being sanitized elsewhere in the code, I agree it would be best practice to use prepared statements. – pjama Jun 10 '12 at 08:58
  • @pjama “sanitized elsewhere in the code” sounds like you’re applying some magical, universal function on it. – Gumbo Jun 10 '12 at 09:01
  • Apart from sanitation / injection concerns, is there anything else to improve the performance? Or would the parameterization be the optimal solution? – pjama Jun 10 '12 at 09:04

1 Answers1

1

I see no problem here, as long as user_id is indexed.

In one similar instance, I have once built a temporary MEMORY table that held what would be the contents of your $str_ids, and did a JOIN with this temp table. The reasons for this trick were:

  • (main reason) the list was fairly large, and I was afraid I could hit some limit in the query legth
  • (side reason) I had to use this list several times in the course of my script

The overhead of building (and perhaps indexing) such a temp table may or may not offset the possible benefit of using it. In my case, it did not.

Edit : Oh and this thread proposes an elegant alternative, if you know in advance the maximum number of elements in your list.

Community
  • 1
  • 1
RandomSeed
  • 29,301
  • 6
  • 52
  • 87