I have df that represents users browsing behavior over time. Therefore the df contains a unique UserId and each row has a timestamp and represents a visit to a certain website. Each website has a unique website Id and a unique website category, say c("electronics", "clothes",....). Now I want to count per row how many unique websites per category the user has visited up to that row (including that row). I call this variable "breadth" since it represents how broad a user is browsing through the internet.
So far I only manage to produce dumb code that creates the total number of unique websites visited per category by filtering on each category and then take the length of the unique vector by the user and then do a left join. Therefore I do lose information about the development over time.
Thanks so much in advance!
total_breadth <- df %>% filter(category=="electronics") %>%
group_by(user_id) %>%
mutate(breadth=length(unique(website_id)))
#Structure of the df I want to achieve:
user_id time website_id category breadth
1 1 70 "electronics" 1
1 2 93 "clothing" 1
1 3 34 "electronics" 2
1 4 93 "clothing" 1
1 5 26 "electronics" 3
1 6 70 "electronics" 3
#Structure of the df I produce:
user_id time website_id category breadth
1 1 70 "electronics" 3
1 2 93 "clothing" 1
1 3 34 "electronics" 3
1 4 93 "clothing" 1
1 5 26 "electronics" 3
1 6 70 "electronics" 3