8

Let's say I have following table:

CREATE TABLE `occurences` (
  `object_id` int(10) NOT NULL,
  `seen_timestamp` int(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

which contains ID of object (not unique, it repeats) and timestamp when this object ID has been observed.

Observation is running 24/7 and inserts every occurrence of object ID with current timestamp.

Now I want to write query to select all object IDs which has been seen during any 10 minute period at least 7 times.

It should function like detection of intrusion.

Similar algorithm is used in denyhost script which checks for invalid SSH logins. If find configured number of occurrences during configured time period, it blocks IP.

Any good suggestion?

j0k
  • 22,600
  • 28
  • 79
  • 90
rkosegi
  • 14,165
  • 5
  • 50
  • 83
  • 1
    why are you storing timestamp as integer value? – hjpotter92 Apr 17 '12 at 08:13
  • Because I'm not interested on exact time/date but on difference between occurrences.Calculation with integers are faster as I expect – rkosegi Apr 17 '12 at 08:14
  • @rkosegi, You need pure mysql answer or is PHP mixed OK? – Starx Apr 17 '12 at 10:21
  • I'm not using PHP at all, I know how to do it using aditional code, so pure SQL is required. – rkosegi Apr 17 '12 at 10:41
  • try searching "group by time(stamp) interval", it will yield you [m](http://stackoverflow.com/q/7992252) [a](http://stackoverflow.com/q/7571740) [n](http://stackoverflow.com/q/4342370) [y](http://stackoverflow.com/q/6884207) [y](http://stackoverflow.com/q/4342370) [y](http://stackoverflow.com/q/3086386) [results](http://stackoverflow.com/search?q=group+by+timestamp+interval) :-) – Tomas Apr 17 '12 at 19:51

3 Answers3

4

This should work:

SET @num_occurences = 7; -- how many occurences should occur in the interval
SET @max_period = 10; -- your interval in seconds

SELECT offset_start.object_id FROM 
(SELECT @rownum_start := @rownum_start+1 AS idx, object_id, seen_timestamp 
 FROM occurences, (SELECT @rownum_start:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_start
JOIN
(SELECT @rownum_end := @rownum_end + 1 AS idx, object_id, seen_timestamp 
 FROM occurences, (SELECT @rownum_end:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_end
   ON offset_start.object_id = offset_end.object_id 
  AND offset_start.idx + @num_occurences - 1 = offset_end.idx
  AND offset_end.seen_timestamp - offset_start.seen_timestamp <= @max_period
GROUP BY offset_start.object_id;

You can move @num_occurences and @num_occurences to your code and set these as parameters of your statement. Depending on your client you can also move the the initialisation of @rownum_start and @rownum_end in front of the query, that might improve the query performance (you should test that nontheless, just a gut feeling looking at the explain of both versions)

Here's how it works:

It selects the entire table twice and joins each row of offset_start with the row in offset_end which has an offset of @num_occurences. (This is done using the @rownum_* variables to create the index of each row, simulating row_number() functionality known from other rdbms).
Then it just checks whether the two rows refer to the same object_id and satisfy the period requirements.
Since this is done for every occurence row, the object_id would be returned multiple times if the number of occurences is actually larger than @max_occurences, so it's grouped in the end to make the returned object_ids unique

ddelbondio
  • 441
  • 2
  • 2
1

You could try

SELECT COUNT(seen_timestamp) AS tot FROM occurences
WHERE seen_timestamp BETWEEN
    DATE_ADD(your_dt, INTERVAL -10 MINUTES) AND your_dt
GROUP BY object_id
HAVING tot >= 7

I don't understand why you use int(10) for seen_timestamp: you could use a datetime...

Marco
  • 56,740
  • 14
  • 129
  • 152
  • I use timestamps because other parts of program need timestamp.I don't think this is usable, because there is no "your_dt".Select should look at entire table and find object ID which occurs 7 or more times in any 10 min interval.Imagine it like "who visit my site for 7 or more times during 10 minutes interval" (not last 10 minutes) – rkosegi Apr 05 '12 at 12:58
  • You can't get ANY timeframe with just one SQL statement. – Philippe Girolami Apr 17 '12 at 08:24
1

you could use following statements:

SELECT oc1.object_id 
    FROM occurences oc1 
        JOIN occurences oc2 ON oc1.object_id = oc2.object_id  
            AND oc1.seen_timestamp >= (oc2.seen_timestamp -600)
            AND oc1.seen_timestamp < oc2.seen_timestamp
    GROUP BY oc1.object_id, oc1.seen_timestamp
    HAVING COUNT(oc2.object_id)>=7;

It is not very fast, and not very clean, let me know if anyone finds a better solution!

Argeman
  • 1,345
  • 8
  • 22