2

I'm trying to format my base_df to show users_id organized by program watched.

base_df:

enter image description here

base_df <- structure(list(category = c("News", "News", "Sports", "Sports", 
"sports", "Sports", "Sports", "sports"), programs = c("News A", 
"News B", "Sports A", "Sports B", "sports C", "Sports A", "Sports B", 
"sports C"), users_id = c(10003831, 10003823, 10003841, 10003823, 
10003851, 10003851, 10003851, 10003854)), row.names = c(NA, -8L
), class = c("tbl_df", "tbl", "data.frame"))

Desired output:

enter image description here

I think pivot_longer can help me here:

I've tried to use it but it returns a df with just the users_ids. What am I doing wrong?

b <- pivot_wider(
  base_df,
  id_cols = users_id,
  names_from = programs
)

enter image description here

Omar Gonzales
  • 3,806
  • 10
  • 56
  • 120
  • Are there cases where a user does NOT watch sports? And If so, do you still want it with count 0? – Sotos Jul 29 '20 at 13:09
  • There are, but I can manually filter that category out, so we can only work with sports. I'd apply this answer to other categories as well :). – Omar Gonzales Jul 29 '20 at 13:12

1 Answers1

2

You don't need to convert to wide format, but rather aggregate, i.e.

library(dplyr)

base_df %>% 
 filter(category %in% c('Sports', 'sports')) %>% 
 group_by(users_id) %>% 
 summarise(how_many = n(), 
           which = toString(programs))


# A tibble: 4 x 3
#  users_id how_many which                       
#     <dbl>    <int> <chr>                       
#1 10003823        1 Sports B                    
#2 10003841        1 Sports A                    
#3 10003851        3 sports C, Sports A, Sports B
#4 10003854        1 sports C         

   
Sotos
  • 51,121
  • 6
  • 32
  • 66