-2

Hello – I have a DB table (MySQL ver 5.6.41-84.1-log) that has about 92,000 entries, with columns for:

  • id (incremental unique ID)
  • post_type (not important)
  • post_id (not important, but shows relation to another table)
  • user_id (not important)
  • vote (not important)
  • ip (IP Address, ie. 123.123.123.123)
  • voted (Datestamp in GMT, ie. 2018-12-03 04:50:05)

I recently ran a contest and we had a rule that no single IP could vote more than 60 times per day. So now I need to run a custom SQL formula that applies the following rule:

For each IP address, for each day, if there are > 60 rows, delete those additional rows.

Thank you for your help!

Mike B.
  • 773
  • 3
  • 12
  • 26

3 Answers3

1

This is a complicated one, and I think it is hard to provide a 100% sure answer without actual table and data to play with.

However let me try to describe the logic, and build the query step by step so you can paly around with it and possibly fix lurking erros.

1) We start with selecting all ip adresses that posted more than 60 votes on a given day. For this we use a group by on the voting day and on the ip adress, combined with a having clause

select date(voted), ip_adress
from table 
group by date(voted), ip_adress 
having count(*) > 60

2) From then, we go back to the table and select the first 60 ids corresponding to each voting day / ip adress couple. id is an autoincremented field so we just sort using this field and the use the mysql limit instruction

    select id, ip_adress, date(voted) as day_voted
    from table 
    where ip_adress, date(voted) in (
        select date(voted), ip_adress 
        from table 
        group by date(voted), ip_adress 
        having count(*) > 60
    ) 
    order by id
    limit 60

3) Finally, we go back once again to the table and search for the all ids whose ip adress and day of vote belong to the above list, but whose id is greater than the max id of the list. This is achieved with a join and requires a group by clause.

select t1.id 
from 
    table t1
    join (      
        select id, ip_adress, date(voted) as day_voted 
        from table 
        where ip_adress, date(voted) in (
            select date(voted), ip_adress 
            from table
            group by date(voted), ip_adress
            having count(*) > 60
        )
        order by id
        limit 60
    ) t2 
        on t1.ip_adress = t2.ip_adress 
        and date(t1.voted) = t2.day_voted and t1.id > max(t2.id)
group by t1.id

That should return the list of all ids that we need to delete. Test if before you go further.

4) The very last step is to delete those ids. There are limitations in mysql that make a delete with subquery condition quite uneasy to achieve. See the following SO question for more information on the technical background. You can either use a temporary table to store the selected ids, or try to outsmart mysql by wrapping the subquery and aliasing it. Let us try with the second option :

delete t.* from table t where id in ( select id from (
    select t1.id 
    from 
        table t1
        join (      
            select id, ip_adress, date(voted) as day_voted 
            from table 
            where ip_adress, date(voted) in (
                select date(voted), ip_adress
                from table 
                group by date(voted), ip_adress
                having count(*) > 60
            )
            order by id
            limit 60
        ) t2 
            on t1.ip_adress = t2.ip_adress
            and date(t1.voted) = t2.day_voted
            and t1.id > max(t2.id)
    group by t1.id
) x );

Hope this helps !

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you so much for this info. Dumb question, but what variable am I replacing in your code with my table name, 'wp_voting_public'. – Mike B. Dec 05 '18 at 01:25
  • That would be `table`. Replace it everywhere with your table name, `wp_voting_public` – GMB Dec 05 '18 at 01:26
  • Ok, I got the first example working fine. Error with the second: https://cl[dot]ly/109ffc2e3e4f – Mike B. Dec 05 '18 at 01:33
  • It’s in that URL screenshot I provided, replacing [dot] with . – Mike B. Dec 05 '18 at 01:51
  • OK. Problem with column alias syntax in mysql (I guess I worked with Oracle too much). Fixed the queries. – GMB Dec 05 '18 at 01:58
  • I really appreciate you working with me on this, and apologies for the delay in my response. Similar error: https://cl[dot]ly/901c9b7afec3 – Mike B. Dec 05 '18 at 02:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184762/discussion-between-mike-b-and-gmb). – Mike B. Dec 05 '18 at 15:11
0

You could approach this by vastly simplifying your sample data and using row number simulation for mysql version prior to 8.0 or window function for versions 8.0 or above. I assume you are not on version 8 or above in the following example

drop table if exists t;
create table t(id int auto_increment primary key,ip varchar(2));
insert into t (ip) values
(1),(1),(3),(3),
(2),
(3),(3),(1),(2);

delete t1 from t t1 join
(
select id,rownumber from
(
select t.*,
         if(ip <> @p,@r:=1,@r:=@r+1) rownumber,
         @p:=ip p
from t
cross join (select @r:=0,@p:=0) r
order by ip,id
)s
where rownumber > 2
) a on a.id = t1.id;

Working in to out the sub query s allocates a row number per ip, sub query a then picks row numbers > 2 and the outer multi-table delete deletes from t joined to a to give

+----+------+
| id | ip   |
+----+------+
|  1 | 1    |
|  2 | 1    |
|  3 | 3    |
|  4 | 3    |
|  5 | 2    |
|  9 | 2    |
+----+------+
6 rows in set (0.00 sec)
P.Salmon
  • 17,104
  • 2
  • 12
  • 19
  • Thank you for providing this response. Where does the "per day" rule get implemented in your solution, using my column of `voted`? I'm an absolute novice when it comes to running SQL commands, so simplifying the dataset doesn't seem that advantageous to me if I can't understand the function. Also, I believe my MySQL version is 5.6.41-84.1-log, having run SELECT version() from my phpMyAdmin console. – Mike B. Dec 05 '18 at 12:23
0

I had someone help me write the following query, which addressed my question.

SET SQL_SAFE_UPDATES = 0;
create table temp( SELECT id, ip, voted
    FROM
        (SELECT id, ip, voted,
            @ip_rank := IF(@current_ip = ip, @ip_rank + 1, 1) AS ip_rank,
            @current_ip := ip
        FROM `table_name` where ip in (SELECT ip from `table_name` group by date(voted),ip having count(*) >60)
        ORDER BY ip, voted desc
        ) ranked
    WHERE ip_rank <= 2);
DELETE FROM `table_name`
WHERE id not in (select id from temp) and ip in (select ip from temp);
drop table temp;
Mike B.
  • 773
  • 3
  • 12
  • 26