2

I have a table with hundreds of thousands of rows, assigned to a user. e.g.:

itemid | userid | etc
1      | 1      | etc
2      | 1      | etc
3      | 1      | etc
4      | 3      | etc
5      | 3      | etc
6      | 3      | etc
etc    | etc    | etc

A user can have any number of items assigned to him or her. I.e. Any number from 0 to infinity. My problem is I want an SQL query that will delete all items for each user, but keep 20. If the user has less than 20, for example, only 10 items assigned, it must keep all 10.

How do I do that?

UPDATE

If the user has 50 items, with ids 1 - 50, it must return items 30 - 50. In other words, the last 20 inserted items for that user.

rockstardev
  • 13,479
  • 39
  • 164
  • 296
  • 1
    Wont a simple `DELETE * FROM table WHERE itemid>20` work? – Daanvn Apr 22 '13 at 11:48
  • 3
    Is there a sort for items ? Do you want to keep 20 random items or 20 last inserted items ? – Marcassin Apr 22 '13 at 11:49
  • @Daanvn I thought the same, but I guess it won't - see the itemid looks like an AI column – MythThrazz Apr 22 '13 at 11:49
  • I want to keep the 20 most recent items for each user. Ie. Sorted by ID. So in the above case, it would return 6,5,4 for user 3 – rockstardev Apr 22 '13 at 11:55
  • `DELETE` does support a [subquery](http://stackoverflow.com/questions/6296102/mysql-delete-with-group-by) so, try something like `DELETE FROM table WHERE itemid in (SELECT ItemId FROM users HAVING count(itemid) = 20)` – hd1 Apr 22 '13 at 12:01
  • http://stackoverflow.com/a/9652359/1961634 this answer may be useful = just use the expression in it as a subquery – Dariusz Apr 22 '13 at 12:08
  • I'm afraid that this cannot be done in a single query. – thavan Apr 22 '13 at 12:20

4 Answers4

2

Let's say we want to delete all but the two most recent entries for each user...

CREATE TABLE my_table(itemid INT NOT NULL AUTO_INCREMENT PRIMARY KEY,userid INT NOT NULL);

INSERT INTO my_table VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 3),
(5, 3),
(6, 3);

SELECT * FROM my_table;
+--------+--------+
| itemid | userid |
+--------+--------+
|      1 |      1 |
|      2 |      1 |
|      3 |      1 |
|      4 |      3 |
|      5 |      3 |
|      6 |      3 |
+--------+--------+

Here's a query to select the rows for deletion...

SELECT a.* 
  FROM my_table a 
  LEFT 
  JOIN 
     ( SELECT x.*, COUNT(*) FROM my_table x JOIN my_table y ON y.userid = x.userid AND y.itemid >= x.itemid GROUP BY x.itemid HAVING COUNT(*) <=2)b 
    ON b.itemid = a.itemid 
 WHERE b.itemid IS NULL;
+--------+--------+
| itemid | userid |
+--------+--------+
|      1 |      1 |
|      4 |      3 |
+--------+--------+

...and here's a query to remove them...

DELETE a 
  FROM my_table a 
  LEFT 
  JOIN 
     ( SELECT x.*, COUNT(*) FROM my_table x JOIN my_table y ON y.userid = x.userid AND y.itemid >= x.itemid GROUP BY x.itemid HAVING COUNT(*) <=2)b 
    ON b.itemid = a.itemid 
 WHERE b.itemid IS NULL;

SELECT * FROM my_table;
+--------+--------+
| itemid | userid |
+--------+--------+
|      2 |      1 |
|      3 |      1 |
|      5 |      3 |
|      6 |      3 |
+--------+--------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • Good answer. For more details, this URL helped me understanding that stuff http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ when solving similar issue – Pavel Dubinin Apr 22 '13 at 12:52
0

try this

     delete from Fruits where itemid  
     not in (select itemid from (select itemid from Fruits where userid = 1
     ORDER BY itemid desc limit 20)x  )
     and userid=1

DEMO HERE

echo_Me
  • 37,078
  • 5
  • 58
  • 78
  • this will remove all records and leave just 20 for userid=1 – Pavel Dubinin Apr 22 '13 at 12:57
  • @PavelDubinin you was right , i edited my answer , thanks for the catch :) – echo_Me Apr 22 '13 at 13:11
  • I believe he wants to make this for all users at once, so with this approach the query you suggest will need to be executed for each user separately (which can result in thousands of queries). This is very unefficient – Pavel Dubinin Apr 22 '13 at 13:22
0

That code works as follows:

  • Creates a new column: count - which counts the items for each user
  • Orders the results by itemid descending - so that the first events for user get the highest ids
  • Selects/deletes all of the rows with count > 20 - leaving the last 20 (or less) events intact

SELECT:

SET @count=1;
SET @last_user_id=0;

SELECT * FROM (
SELECT *,
case 
when (@last_user_id = userid) 
then 
@count:=@count+1
else @count:=1 end as count,
case 
when (@last_user_id!=userid) 
then 
@last_user_id:=userid
end
as new
FROM <tablename> ORDER BY userid, itemid DESC) AS inner_table WHERE count > 20

DELETE:

SET @count=1;
SET @last_user_id=0;

DELETE FROM (
SELECT *,
CASE 
WHEN (@last_user_id = userid) 
THEN  
@count:=@count+1
ELSE 
@count:=1 
END 
AS count,
CASE 
WHEN (@last_user_id!=userid) 
THEN  
@last_user_id:=userid
END 
AS new
FROM <tablename> ORDER BY userid, itemid DESC) AS inner_table WHERE count > 20
MythThrazz
  • 1,629
  • 17
  • 25
-1

This worked for me:

DELETE FROM table WHERE itemid NOT IN ( 
      SELECT itemid 
        FROM ( 
          SELECT * 
          FROM table
          WHERE userid = 1
          ORDER BY itemid  DESC 
          LIMIT 20
        ) x 
    ) 
    AND userid = 1
rockstardev
  • 13,479
  • 39
  • 164
  • 296