I have the following dataset:
User | Session_ID | Page | Path_Number |
---|---|---|---|
123A | 12345 | home | 1 |
123A | 12345 | services | 2 |
123A | 12345 | pricing | 3 |
123A | 12345 | about | 4 |
123A | 12345 | services | 5 |
123A | 12345 | home | 6 |
123B | 34567 | home | 1 |
123B | 34567 | services | 2 |
123B | 34567 | about | 3 |
123B | 34567 | multimedia | 4 |
123C | 56789 | home | 1 |
123C | 56789 | about | 2 |
123C | 56789 | pricing | 3 |
123C | 56789 | about | 4 |
123C | 56789 | services | 5 |
There are three users with unique session IDs. Path Number is the path they follow once they are on the website. And, Page is the pages they visit.
The question that I am trying to answer is: How many people first go to the 'services' page and then go to the 'about' page?
I am using the following code to assess which user and session have both 'services' and 'about' in the path:
dataset %>% group_by(Session_ID, User) %>%
summarize(services_and_about = ('services' %in% Page) & ('about' %in% Page)) %>%
filter(services_and_about == "TRUE")
The result would be users 123A, 123B, and 123C.
However, I would like to also know which users visit the 'services' page BEFORE the 'about' page (only users 123A and 123B). I know I should use a lag or lead function here, but I am not sure how.
Thanks a lot for helping!