1

I want to make new columns, based on the values of a single existing column. It is event data (from a website), so the number of values are different. Just like this:

row    Events 
1       237,2,236,102,106,111,114,115,116,117,118,119,125
2       237,111,116
3       102,106,111,114,115
4       237,2,236,102,106,111,114,115,116,117,118,119,125, 126

The result should be dummy data, based on the different values.

row   237  2  236  102  106  111  114  115  116  117 118  119 125  126
1     1    1   1    1    1    1    1    1    1    1   1    1   1   0
2     1    0   0    0    0    1    0    0    1    0   0    0   0   0  
3     0    0   0    1    1    1    1    1    0    0   0    0   0   0
4     0    0   0    1    1    1    1    1    0    0   0    0   0   1

I tried to solve this with the tidyr separate function, in combination with the function "createDummyFeatures" (MLR package). But, I had to name the columns manually (and ideally it should take the name of the value, just as in the example).

R overflow
  • 1,292
  • 2
  • 17
  • 37
  • Use akrun's answer at the linked question, just skipping the `colnames` part, and `cbind` with your original first column. – A5C1D2H2I1M1N2O1R2T1 Jan 09 '18 at 09:49
  • Using `library(tidyr)` as in your tag: `mydf %>% mutate(Events = strsplit(as.character(Events), ",")) %>% unnest(Events) %>% distinct(.) %>% spread(Events,Events) %>% mutate_at(.vars=(-1),.funs=funs(if_else(is.na(.),0,1)))` I'd like to answer you (because neither @akrun or the answers in duplicated question used this approach), but unfortunately they marked your question as duplicate. – Scipione Sarlo Jan 09 '18 at 10:11

1 Answers1

0

We can use the table approach after splitting by , and converting it to a data.frame with stack

table(stack(setNames(strsplit(df1$Event, ","), df1$row))[2:1])

data

df1 <- structure(list(row = 1:4, 
 Events = c("237,2,236,102,106,111,114,115,116,117,118,119,125", 
 "237,111,116", "102,106,111,114,115", 
 "237,2,236,102,106,111,114,115,116,117,118,119,125, 126"
)), .Names = c("row", "Events"), class = "data.frame", row.names = c(NA, 
 -4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • The solution led to an error. I tried to fix it by changing the last part of the code [2:1], but that didn't work. Error in data.frame(values = unlist(unname(x)), ind, stringsAsFactors = FALSE) : arguments imply differing number of rows: 969732, 0 – R overflow Jan 09 '18 at 09:59
  • 1
    @Roverflow Based on your example, it is working fine for me. I added the dput output of the example I used – akrun Jan 09 '18 at 10:01
  • works! Strange enough, when I convert the table to a data.frame.matrix() or as.data.frame() the columns and rows are shuffled again. – R overflow Jan 09 '18 at 10:16
  • 1
    @Roverflow You need `as.data.frame.matrix(table(stack(setNames(strsplit(df1$Event, ","), df1$row))[2:1]))` – akrun Jan 09 '18 at 10:17