I am currently trying to write a query from a table that list out users surfing behaviour. The table looks like the one below
**RecordID RespondentID DeviceID UTCTimestamp Domain**
1 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 goodreads.com 2 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 goodreads.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 gr-assets.com 4 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 gr-assets.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:23 itunes.apple.com 6 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:23 itunes.apple.com 7 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:51 samplicio.us 8 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:51 samplicio.us
Thanks to everyone's help I managed to get this.
RecordID RespondentID UTCTimestamp Source Domain To Domain RecordID
2 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com 3 4 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com 5 6 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us 7
The To Domain is the value from the next row where the domain name is different.
Problem While this looks correct, we actually have skipped the entire first record. This is because, given the data set, the first row "domain" is joined to the second row "domain" and we skipped it. Row 2 is combined with Row 3 hence the first resulting record shows RecordID 2. I would like to fine tune this further. My result should should start from RecordID 1 and skipping RecordID 2 since the domain is the same, hence the result should show
RecordID RespondentID UTCTimestamp Source Domain To Domain
1 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us
I tried skipping the RecordID 2, however, an SQL error 'prev_nane' does not exist was encountered.
SELECT t1."RecordID", t1."RespondentID", t1."UTCTimestamp", t1."Domain" as "Source Domain", t2."Domain" as "To Domain" , t2."RecordID", lag(t1."Domain",1) over (order by t1."RecordID") as prev_name
from public."Traffic - Mobile" as t1
join public."Traffic - Mobile" as t2 on t2."RespondentID" = t1."RespondentID" AND t2."DeviceID"=t1."DeviceID" AND t2."RecordID"=t1."RecordID"+1 And t1."Domain"<>T2."Domain" AND t2."UTCTimestamp">=t1."UTCTimestamp" AND t2."Sequence"-t1."Sequence"=1 and t1."RecordID"<13 AND t1."Domain"<>prev_name;
What did I do wrong?
Also my final result I want to achieve is below RecordID RespondentID UTCTimestamp Source Domain To Domain Final Destination
1 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com samplicio.us 3 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com samplicio.us 5 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us samplicio.us
An additional column called 'Final destination'. This is to allow me to group the 3 transactions together as the path taken to reach samplicio.us.
Thanks in advance.