I have the following table in R:
id | time_visited | outcome | url_link
-----|-------------------|---------|-------------------------
1 |2012-01-01 00:00:00| 1 |google.com
1 |2012-01-01 00:00:00| 1 |google.com/news
1 |2012-01-01 00:00:00| 1 |google.com/news/cnn
2 |2012-01-01 11:11:11| 0 |youtube.com
2 |2012-01-01 11:11:11| 0 |youtube.com/search
2 |2012-01-01 11:11:11| 0 |youtube.com/search/catvideos
I am trying to spread the data using tidyr()::spread()
to attain the following table:
The spreading will be on the url_link variable and it's value will be filled by the outcome variable - however I would still like to retain the outcome variable to signal that overall value.
The table I am trying to get would look like this:
id | time_visited | outcome | google.com | google.com/news | google.com/news/cnn | youtube.com...
-----|-------------------|---------|------------------------------|---------------------|--------------
1 |2012-01-01 00:00:00| 1 | 1 | 1 | 1 | 0
2 |2012-01-01 11:11:11| 0 | 0 | 0 | 0 | 1
I have not added all columns to the end as I do not have spaced but it should follow that it is youtube.com/search and youtube.com/search/catvideos as 2 additional columns
I have tried using the following code but still no outcome:
df %>% spread(url_link, outcome, -c(time_visited, outcome), fill = outcome)
Essentially trying to spread the variable url_link into new variables and have the value from outcome variable fill it - but also retain the outcome variable in the data.
Note: that I am trying to create a flag of 0/1 if the id is associated with that url_link value, so in the case of the google.com ones it is only with user_id == 1 so hence a 1 flag - for youtube it is not hence a 0 flag