I'm trying to hunt down a potential bug in my application, and I'd like to see if there are duplicate events recorded for a user, which would be evidenced by duplicate data in two columns, user
and value
whereas the other columns could have slightly different meta data. Is there a way to identify such duplicates in a single SQL query?
Asked
Active
Viewed 821 times
0

helloB
- 3,472
- 10
- 40
- 87
-
2Possible duplicate of [Select statement to find duplicates on certain fields](http://stackoverflow.com/questions/4434118/select-statement-to-find-duplicates-on-certain-fields) – Adam Jun 23 '16 at 15:42
-
1Adjacent? Then you can use LAG or LEAD to see the previous or next record. Or use an EXISTS clause to find records where exists another record in the same minute or so with the same user and value. – Thorsten Kettner Jun 23 '16 at 15:43
2 Answers
1
Window functions can be used here. Especifically lag
or lead
deppending if you want the previous or the next occurrence of the dupplicate.
This query uses lag
but can be safely changed by lead
:
WITH event_with_lag_data AS (
SELECT user, value, value_ts,
lag(user) over (order by value_ts) as prev_user,
lag(value) over (order by value_ts) as prev_value
FROM event_data
)
SELECT user, value, value_ts
FROM event_with_lag_data
WHERE user = prev_user AND value = prev_value
value_ts
is the ordering column. I assume events are ordered by date/time.
If you have more columns to test for equality is just a matter of adding more lines to the lag
part and to the final where
part
0
In case you are just looking to find any duplicates based on two columns since it isn't clear what you mean by "adjacent", here is a solution:
WITH duplicates AS (
select
user,
value,
COUNT(*) AS COUNT
FROM event_data
GROUP BY
user,
value
)
SELECT
user,
value
FROM duplicates
WHERE COUNT > 1

Shayna
- 716
- 7
- 10