thanks for your help.
My question is very related to this thread.
Note this df:
df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple"))
And we can spread into 'dummy variables' like so:
df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)
Now note what happens when I add a duplicate fruit
.
df2 <- data.frame(id = c(1,1,2,3,4,4), fruit = c("apple","pear","apple","orange","apple","apple"))
Again spread
df2 %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)
Gives Error: Duplicate identifiers for rows (5, 6)
Ideally, the correct result would return two fields called apple_1
and apple2
which should both be set to 1
for id=4
.