0

I have a table which summarize the time spent by users on a specific page of the app. I want to get the total time spent on this page per day.

The table:

user_id  date                          label_of_the_page
1         2019-03-03T00:21:56.384Z .     page we want
1         2019-03-03T00:21:57.314Z       page we want
1         2019-03-03T00:21:58.024Z .     page we want
1         2019-03-03T00:21:59.384Z       new page

The idea is to make the difference between the first time the label_of_the_page is the page I want, and the time for which the label change. But, we have to do it for each person who reach the page. It can be several time the same user_id, so grouping by user_id is not a good idea, I think...

nolwww
  • 1,355
  • 1
  • 15
  • 33
  • You are right... It's redshift !! – nolwww Jul 03 '19 at 07:24
  • Have a go at using first_value and last_value partitioned by "label_of_the_page" and user_id and ordered by date. https://docs.aws.amazon.com/redshift/latest/dg/r_WF_first_value.html when you have a query - it probably wont be quite right so come back for help. – Jon Scott Jul 03 '19 at 14:35
  • it's a variant of "gaps and islands" problem, there is a plenty of answers how to get the duration of every island, then just summarize them by page id – AlexYes Jul 03 '19 at 16:58

1 Answers1

0

So I guess the simplest way is like this:

Make a new column with three labels:

Case 1: User in the page we are interested in, and in the previous row he was not Case 2: User in an other page and was in the page we are interested in, in the row before. Case 3: User in the page and was also on the page the row before, or in other page and other page in the row before

Then we only keep the rows of case 1 and 2, and we just have to make the difference between [time(case 2) - time(case 1)] for each session.

But Im not sure how I can make all of it with SQL.

nolwww
  • 1,355
  • 1
  • 15
  • 33