0

I'm trying to hunt down a potential bug in my application, and I'd like to see if there are duplicate events recorded for a user, which would be evidenced by duplicate data in two columns, user and value whereas the other columns could have slightly different meta data. Is there a way to identify such duplicates in a single SQL query?

helloB
  • 3,472
  • 10
  • 40
  • 87
  • 2
    Possible duplicate of [Select statement to find duplicates on certain fields](http://stackoverflow.com/questions/4434118/select-statement-to-find-duplicates-on-certain-fields) – Adam Jun 23 '16 at 15:42
  • 1
    Adjacent? Then you can use LAG or LEAD to see the previous or next record. Or use an EXISTS clause to find records where exists another record in the same minute or so with the same user and value. – Thorsten Kettner Jun 23 '16 at 15:43

2 Answers2

1

Window functions can be used here. Especifically lag or lead deppending if you want the previous or the next occurrence of the dupplicate.

This query uses lag but can be safely changed by lead:

WITH event_with_lag_data AS (
SELECT user, value, value_ts,
  lag(user) over (order by value_ts) as prev_user,
  lag(value) over (order by value_ts) as prev_value 
FROM event_data
)
SELECT user, value, value_ts
FROM event_with_lag_data
WHERE user = prev_user AND value = prev_value

value_ts is the ordering column. I assume events are ordered by date/time.
If you have more columns to test for equality is just a matter of adding more lines to the lag part and to the final where part

ArtemGr
  • 11,684
  • 3
  • 52
  • 85
cachique
  • 1,150
  • 1
  • 12
  • 16
0

In case you are just looking to find any duplicates based on two columns since it isn't clear what you mean by "adjacent", here is a solution:

    WITH duplicates AS (
      select
        user,
        value,
        COUNT(*) AS COUNT
      FROM event_data
      GROUP BY 
        user, 
        value
    )

    SELECT 
      user,
      value
    FROM duplicates
    WHERE COUNT > 1
Shayna
  • 716
  • 7
  • 10