-2

According to this question and answer it is possible to convert a long list to a binary dataframe.

However how could it be possible to use it into a dataframe which contains the same value more than one time for every user?

Example of dataframe:

d_long <- data.frame( nameid = c("sally","sally","sally", "sally","Robert","annie","annie","annie"), value = c("product1","ra","ent","ra","ra","ra","product1","product1"))
nameid    value
1  sally product1
2  sally       ra
3  sally      ent
4  sally       ra
5 Robert       ra
6  annie       ra
7  annie product1
8  annie product1

The expected output is this:

d_exist <- data.frame(nameid = c("sally","Robert","annie"), product1 = c(1,0,1), ra = c(1,1,1), ent = c(1,0,0))
nameid product1 ra ent
1  sally        1  1   1
2 Robert        0  1   0
3  annie        1  1   0

But when I try this:

d_long %>% group_by(nameid, value) %>%
     mutate(count = n()) %>%
     ungroup() %>%
     spread(value, count, fill = 0) %>%
     as.data.frame()

I receive the error:

Error: Duplicate identifiers for rows (7, 8), (2, 4)

Is it right to use only

d_long[!duplicated(d_long), ]
zx8754
  • 52,746
  • 12
  • 114
  • 209
user8831872
  • 383
  • 1
  • 14

1 Answers1

1

We can take the distinct and then do the spread

library(tidyverse)
d_long %>%
  distinct %>% 
  mutate(n = 1) %>% 
  spread(value, n, fill = 0)
#    nameid ent product1 ra
#1  annie   0        1  1
#2 Robert   0        0  1
#3  sally   1        1  1
akrun
  • 874,273
  • 37
  • 540
  • 662