2

I have a query where I am identifying more than 1 submission by user for a particular form:

select userid, form_id, count(*)
from table_A
group by userid, form_id
having count(userid) > 1

However, I am trying to see which users are submitting more than 1 form within a 5 second timeframe (We have a field for the submission timestamp in this table). How would I narrow this query down by that criteria?

nikotromus
  • 1,015
  • 16
  • 36

4 Answers4

2

One way is to add to the group by DATEDIFF(Second, '2017-01-01', SubmittionTimeStamp) / 5.
This will group records based on the userid, form_id and a five seconds interval:

select userid, form_id, count(*)
from table_A
group by userid, form_id, datediff(Second, '2017-01-01', SubmittionTimeStamp) / 5
having count(userid) > 1

Read this SO post for a more detailed explanation.

Community
  • 1
  • 1
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
2

@nikotromus

You've not provided a lot of details about your schema and other columns available, nor about what / how and where this information will be used.

However if you want to do it "live" so compare results in your time against current timestamp it would look something like:

SELECT userid, form_id, count(*)
  FROM table_A
 WHERE DATEDIFF(SECOND,YourColumnWithSubmissionTimestamp, getdate()) <= 5
GROUP BY userid, form_id
HAVING count(userid) > 1
morb1d
  • 143
  • 5
  • Why not just use one datediff between `YourColumnWithSubmissionTimestamp` and `getdate()`? – Zohar Peled Apr 05 '17 at 19:18
  • @ZoharPeled - you mean something like this in WHERE: DATEDIFF(SECOND, '19000101', getdate() - YourColumnWithSubmissionTimestamp)? I'm not able to test the idea right now, thus posted something that I would expect will work. Thanks for comment though! – morb1d Apr 05 '17 at 19:21
  • No, I mean something like this: `WHERE DATEDIFF(SECOND,YourColumnWithSubmissionTimestamp, getdate()) <= 5` – Zohar Peled Apr 05 '17 at 19:22
  • @ZoharPeled - ahh of course, yes your are totally right. Thanks for good point :) – morb1d Apr 05 '17 at 19:23
1

You can use lag to form groups of rows that are within 5 seconds of each other and then do aggregation on them:

select distinct userid,
    form_id
from (
    select t.*,
        sum(val) over (
            order by t.submission_timestamp
            ) as grp
    from (
        select t.*,
            case 
                when datediff(ms, lag(t.submission_timestamp, 1, t.submission_timestamp) over (
                            order by t.submission_timestamp
                            ), t.submission_timestamp) > 5000
                    then 1
                else 0
                end val
        from your_table t
        ) t
    ) t
group by userid,
    form_id,
    grp
having count(*) > 1;

See this answer for more explanation:

Community
  • 1
  • 1
Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
  • 1
    I find it nice that we both are recycling answers. Very enviromnetal :-) – Zohar Peled Apr 05 '17 at 19:15
  • @ZoharPeled - :) – Gurwinder Singh Apr 05 '17 at 19:16
  • Got this error when I tried to run it... Msg 535, Level 16, State 0, Line 1 The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart. – nikotromus Apr 05 '17 at 19:54
  • @nikotromus - the maximum difference between startdate and enddate is 24 days, 20 hours, 31 minutes and 23.647 seconds. If you try changing it to seconds in `datediff(seconds,..). . . > 5`. For seconds the range is 68 years. - https://learn.microsoft.com/en-us/sql/t-sql/functions/datediff-transact-sql – Gurwinder Singh Apr 05 '17 at 20:00
0

I would just use exists to get the users:

select userid, form_id
from table_A a
where exists (select 1
              from table_A a2
              where a2.userid = a.userid and a2.timestamp >= a.timestamp and a2.timestamp < dateadd(second, 5, a.timestamp
            );

If you want a count, you can just add group by and count(*).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786