Convert a long list to a binary dataframe having duplicates

Question

According to this question and answer it is possible to convert a long list to a binary dataframe.

However how could it be possible to use it into a dataframe which contains the same value more than one time for every user?

Example of dataframe:

d_long <- data.frame( nameid = c("sally","sally","sally", "sally","Robert","annie","annie","annie"), value = c("product1","ra","ent","ra","ra","ra","product1","product1"))

nameid    value
1  sally product1
2  sally       ra
3  sally      ent
4  sally       ra
5 Robert       ra
6  annie       ra
7  annie product1
8  annie product1

The expected output is this:

d_exist <- data.frame(nameid = c("sally","Robert","annie"), product1 = c(1,0,1), ra = c(1,1,1), ent = c(1,0,0))

nameid product1 ra ent
1  sally        1  1   1
2 Robert        0  1   0
3  annie        1  1   0

But when I try this:

d_long %>% group_by(nameid, value) %>%
     mutate(count = n()) %>%
     ungroup() %>%
     spread(value, count, fill = 0) %>%
     as.data.frame()

I receive the error:

Error: Duplicate identifiers for rows (7, 8), (2, 4)

Is it right to use only

d_long[!duplicated(d_long), ]

something like this can help, I am unsure though `(table(d_long$nameid, d_long$value)> 0)+0` — PKumar, May 21 '18 at 06:12

score 1 · Answer 1 · answered May 21 '18 at 06:15

1

We can take the distinct and then do the spread

library(tidyverse)
d_long %>%
  distinct %>% 
  mutate(n = 1) %>% 
  spread(value, n, fill = 0)
#    nameid ent product1 ra
#1  annie   0        1  1
#2 Robert   0        0  1
#3  sally   1        1  1

answered May 21 '18 at 06:15

akrun

874,273
37
540
662

Convert a long list to a binary dataframe having duplicates

1 Answers1