1

thanks for your help.

My question is very related to this thread.

Note this df:

df <- data.frame(id = c(1,1,2,3,4), fruit =  c("apple","pear","apple","orange","apple"))

And we can spread into 'dummy variables' like so:

df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0) 

Now note what happens when I add a duplicate fruit.

df2 <- data.frame(id = c(1,1,2,3,4,4), fruit =  c("apple","pear","apple","orange","apple","apple"))

Again spread

df2 %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

Gives Error: Duplicate identifiers for rows (5, 6)

Ideally, the correct result would return two fields called apple_1 and apple2 which should both be set to 1 for id=4.

www
  • 38,575
  • 12
  • 48
  • 84
Nate Klass
  • 13
  • 2
  • I don't understand why the result should return `apple_1` and `apple2`. If you add `index = row_number()` in `mutate`, `spread` will work. – tyluRp Jan 25 '18 at 22:10
  • I don't understand what your aim is. but maybe you need to look into `dcast(df,id~fruit)` – Onyambu Jan 25 '18 at 22:13

1 Answers1

0

Are you looking for something like:

library(reshape2)    
df2 <- data.frame(id = c(1,1,2,3,4,4), fruit =  c("apple","pear","apple","orange","apple","apple"), stringsAsFactors = FALSE)
    > dcast(df2, id ~ fruit, value.var = 'fruit', fun.aggregate = list )
      id        apple orange pear
    1  1        apple        pear
    2  2        apple            
    3  3              orange     
    4  4 apple, apple 

Another option could be:

> df2 %>%
  group_by(id) %>%
  mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
  dcast( id ~ fruit, value.var = "fruit", fun.aggregate = list )

  id apple_1 apple_2 orange_1 pear_2
1  1 apple_1                  pear_2
2  2 apple_1                        
3  3                 orange_1       
4  4 apple_1 apple_2 

If 0/1 is preferred for each column then:

> df2 %>%
  group_by(id) %>%
  mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
  dcast( id ~ fruit, fill = 0 , fun.aggregate = function(x) 1 )
  id apple_1 apple_2 orange_1 pear_2
1  1       1       0        0      1
2  2       1       0        0      0
3  3       0       0        1      0
4  4       1       1        0      0
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Thanks - this is helpful. I'm interested in the last option, though I'm still getting an error: `Error in unique.default(x) : unique() applies only to vectors.` – Nate Klass Jan 26 '18 at 04:38
  • @NateKlass Are you getting error using exact code as mentioned in solution? If not then please share exact code which gives error. – MKR Jan 26 '18 at 06:05
  • 1
    I was able to adjust the code for my needs, and it worked! Thank you so much for your help. – Nate Klass Jan 28 '18 at 21:14