0

I have the following table in R:

 id   | time_visited      | outcome | url_link
 -----|-------------------|---------|-------------------------
   1  |2012-01-01 00:00:00|  1      |google.com
   1  |2012-01-01 00:00:00|  1      |google.com/news
   1  |2012-01-01 00:00:00|  1      |google.com/news/cnn
   2  |2012-01-01 11:11:11|  0      |youtube.com
   2  |2012-01-01 11:11:11|  0      |youtube.com/search
   2  |2012-01-01 11:11:11|  0      |youtube.com/search/catvideos

I am trying to spread the data using tidyr()::spread() to attain the following table:

The spreading will be on the url_link variable and it's value will be filled by the outcome variable - however I would still like to retain the outcome variable to signal that overall value.

The table I am trying to get would look like this:

 id   | time_visited      | outcome | google.com | google.com/news | google.com/news/cnn | youtube.com...
 -----|-------------------|---------|------------------------------|---------------------|--------------
   1  |2012-01-01 00:00:00|  1      | 1          |    1            |       1             |    0
   2  |2012-01-01 11:11:11|  0      | 0          |    0            |       0             |    1

I have not added all columns to the end as I do not have spaced but it should follow that it is youtube.com/search and youtube.com/search/catvideos as 2 additional columns

I have tried using the following code but still no outcome:

df %>% spread(url_link, outcome, -c(time_visited, outcome), fill = outcome)

Essentially trying to spread the variable url_link into new variables and have the value from outcome variable fill it - but also retain the outcome variable in the data.

Note: that I am trying to create a flag of 0/1 if the id is associated with that url_link value, so in the case of the google.com ones it is only with user_id == 1 so hence a 1 flag - for youtube it is not hence a 0 flag

Beans On Toast
  • 903
  • 9
  • 25

1 Answers1

2

Maybe you can create a copy of outcome variable before getting the data in wide format :

library(dplyr)
library(tidyr)

df %>%
  mutate(outcome1 = outcome) %>%
  pivot_wider(names_from = url_link, values_from = outcome1, values_fill = 0)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213