I have a database with time visit in timestamp like this
ID, time
1, 1493596800
1, 1493596900
1, 1493432800
2, 1493596800
2, 1493596850
2, 1493432800
I use spark SQL and I need to have the longest sequence of consecutives dates for each ID like
ID, longest_seq (days)
1, 2
2, 5
3, 1
I tried to adapt this answer Detect consecutive dates ranges using SQL to my case but I didn't manage to have what I expect.
SELECT ID, MIN (d), MAX(d)
FROM (
SELECT ID, cast(from_utc_timestamp(cast(time as timestamp), 'CEST') as date) AS d,
ROW_NUMBER() OVER(
PARTITION BY ID ORDER BY cast(from_utc_timestamp(cast(time as timestamp), 'CEST')
as date)) rn
FROM purchase
where ID is not null
GROUP BY ID, cast(from_utc_timestamp(cast(time as timestamp), 'CEST') as date)
)
GROUP BY ID, rn
ORDER BY ID
If someone has some clue on how to fix this request, or what's wrong in it, I would appreciate the help Thanks
[EDIT] A more explicit input /output
ID, time
1, 1
1, 2
1, 3
2, 1
2, 3
2, 4
2, 5
2, 10
2, 11
3, 1
3, 4
3, 9
3, 11
The result would be :
ID, MaxSeq (in days)
1,3
2,3
3,1
All the visits are in timestamp, but I need consecutives days, then each visit by day is counted once by day