spreading duplicate rows with ids AND outcome variables

Question

thanks for your help.

My question is very related to this thread.

Note this df:

df <- data.frame(id = c(1,1,2,3,4), fruit =  c("apple","pear","apple","orange","apple"))

And we can spread into 'dummy variables' like so:

df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

Now note what happens when I add a duplicate fruit.

df2 <- data.frame(id = c(1,1,2,3,4,4), fruit =  c("apple","pear","apple","orange","apple","apple"))

Again spread

df2 %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

Gives Error: Duplicate identifiers for rows (5, 6)

Ideally, the correct result would return two fields called apple_1 and apple2 which should both be set to 1 for id=4.

I don't understand why the result should return `apple_1` and `apple2`. If you add `index = row_number()` in `mutate`, `spread` will work. — tyluRp, Jan 25 '18 at 22:10
I don't understand what your aim is. but maybe you need to look into `dcast(df,id~fruit)` — Onyambu, Jan 25 '18 at 22:13

MKR · Accepted Answer · 2018-01-25T22:43:01.927

Are you looking for something like:

library(reshape2)    
df2 <- data.frame(id = c(1,1,2,3,4,4), fruit =  c("apple","pear","apple","orange","apple","apple"), stringsAsFactors = FALSE)
    > dcast(df2, id ~ fruit, value.var = 'fruit', fun.aggregate = list )
      id        apple orange pear
    1  1        apple        pear
    2  2        apple            
    3  3              orange     
    4  4 apple, apple

Another option could be:

> df2 %>%
  group_by(id) %>%
  mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
  dcast( id ~ fruit, value.var = "fruit", fun.aggregate = list )

  id apple_1 apple_2 orange_1 pear_2
1  1 apple_1                  pear_2
2  2 apple_1                        
3  3                 orange_1       
4  4 apple_1 apple_2

If 0/1 is preferred for each column then:

> df2 %>%
  group_by(id) %>%
  mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
  dcast( id ~ fruit, fill = 0 , fun.aggregate = function(x) 1 )
  id apple_1 apple_2 orange_1 pear_2
1  1       1       0        0      1
2  2       1       0        0      0
3  3       0       0        1      0
4  4       1       1        0      0

Thanks - this is helpful. I'm interested in the last option, though I'm still getting an error: `Error in unique.default(x) : unique() applies only to vectors.` — Nate Klass, Jan 26 '18 at 04:38
@NateKlass Are you getting error using exact code as mentioned in solution? If not then please share exact code which gives error. — MKR, Jan 26 '18 at 06:05
I was able to adjust the code for my needs, and it worked! Thank you so much for your help. — Nate Klass, Jan 28 '18 at 21:14

spreading duplicate rows with ids AND outcome variables

1 Answers1