1

I'm trying to detect people using proxies to abuse my website.

Often they will change proxies and so forth. But there is definitely a pattern of them using one proxy address many times. Much more than is normal for legitimate visitors.

Usually most accessing of my website is by unique ip addresses that have only visited once or a few times. Not repeatedly.

Let's say I have these ip addresses in a column:

89.46.74.56
89.46.74.56
89.46.74.56
91.14.37.249
104.233.103.6

That would mean there are 3 uniques out of 5. Giving a "uniqueness score" of 60%.

How would I calculate this efficiently using MySQL?

Amy Neville
  • 10,067
  • 13
  • 58
  • 94

1 Answers1

2

plan

  • get count grouping by ip
  • divide by ( cross-joining ) the total rowcount
  • take maximum repeat ratio from above

setup

create table example
(
  id integer primary key auto_increment not null,
  ip varchar(13) not null
);

insert into example
( ip )
values
( '89.46.74.56'   ),
( '89.46.74.56'   ),
( '89.46.74.56'   ),
( '91.14.37.249'  ),
( '104.233.103.6' )
;

query

select max(repeat_factor)
from
(
select ip, count(*) / rc.row_count as repeat_factor
from example
cross join ( select count(*) as row_count from example ) rc
group by ip
) q
;

output

+--------------------+
| max(repeat_factor) |
+--------------------+
| 0.6                |
+--------------------+

sqlfiddle

amdixon
  • 3,814
  • 8
  • 25
  • 34
  • Works perfectly - genius! Some people are just too good as SQL......I surrender to you. Will accept when it allows me to :) – Amy Neville Dec 31 '15 at 13:04
  • I should mention that it is giving the opposite of what I originally asked in the question (1-uniqueness_score)......but it's all the same really......my fault for writing a question that could be taken either way.....thanks :) – Amy Neville Dec 31 '15 at 13:08
  • 1
    no worries - glad it works for you, sometimes the semantics between these statistical `scores` are tricky to explain – amdixon Dec 31 '15 at 13:15