3

I have a fairly complicated operation that I'm trying to perform with just one SQL query but I'm not sure if this would be more or less optimal than breaking it up into n queries. Basically, I have a table called "Users" full of user ids and their associated fb_ids (id is the pk and fb_id can be null).

+-----------------+
| id | .. | fb_id |
|====|====|=======|
| 0  | .. | 12345 |
| 1  | .. | 31415 |
| .. | .. |  ..   |
+-----------------+

I also have another table called "Friends" that represents a friend relationship between two users. This uses their ids (not their fb_ids) and should be a two-way relationship.

+----------------+
| id | friend_id |
|====|===========|
| 0  |     1     |
| 1  |     0     |
| .. |    ..     |
+----------------+
// user 0 and user 1 are friends

So here's the problem: We are given a particular user's id ("my_id") and an array of that user's Facebook friends (an array of fb_ids called fb_array). We want to update the Friends table so that it honors a Facebook friendship as a valid friendship among our users. It's important to note that not all of their Facebook friends will have an account in our database, so those friends should be ignored. This query will be called every time the user logs in so it can update our data if they've added any new friends on Facebook. Here's the query I wrote:

INSERT INTO Friends (id, friend_id)
SELECT "my_id", id FROM Users WHERE id IN
  (SELECT id FROM Users WHERE fb_id IN fb_array)
AND id NOT IN
  (SELECT friend_id FROM Friends WHERE id = "my_id")

The point of the first IN clause is to get the subset of all Users who are also your Facebook friends, and this is the main part I'm worried about. Because the fb_ids are given as an array, I have to parse all of the ids into one giant string separated by commas which makes up "fb_array." I'm worried about the efficiency of having such a huge string for that IN clause (a user may have hundreds or thousands of friends on Facebook). Can you think of any better way to write a query like this?

It's also worth noting that this query doesn't maintain the dual nature of a friend relationship, but that's not what I'm worried about (extending it for this would be trivial).

patriot
  • 203
  • 2
  • 6
  • Perhaps this will help: http://stackoverflow.com/questions/782915/mysql-or-vs-in-performance. Looks like the performance with the IN clause is your best bet. I would say it's just situational. Good luck. – sgeddes Jan 04 '13 at 01:25

2 Answers2

1

If I am not mistaken, your query can be simplified, if you have a UNIQUE constraint on the combination (id, friend_id), to:

INSERT IGNORE INTO Friends 
  (id, friend_id)
SELECT "my_id", id 
FROM Users 
WHERE fb_id IN fb_array ;

You should have index on User (fb_id, id) and test for efficiency. if the number of the itmes in the array is too big (more than a few thousands), you may have to split the array and run the query more than once. Profile with your data and settings.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Thanks, that's a good point! But do you have any thoughts on the efficiency of that query given that fb_array could be a list of thousands of ids? Maybe I'm worrying about nothing, I've just never written a query like that before. – patriot Jan 04 '13 at 01:11
1

Depends on if if the following columns are nullable (value can be NULL):

  • USERS.id
  • FRIENDS.friend_id

Nullable:

SELECT DISTINCT
       "my_id", u.id 
  FROM Users u
 WHERE u.fb_id IN fb_array
   AND u.id NOT IN (SELECT f.friend_id 
                      FROM FRIENDS f
                     WHERE f.id = "my_id")

Not Nullable:

   SELECT "my_id", u.id 
     FROM Users u
LEFT JOIN FRIENDS f ON f.friend_id = u.id
                   AND f.id = "my_id"
    WHERE u.fb_id IN fb_array
      AND f.fried_id IS NULL

For more info:

Speaking to the number of values in your array

The tests run in the two articles mentioned above contain 1 million rows, with 10,000 distinct values.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502