Create new columns with dummies based on values

Question

I want to make new columns, based on the values of a single existing column. It is event data (from a website), so the number of values are different. Just like this:

row    Events 
1       237,2,236,102,106,111,114,115,116,117,118,119,125
2       237,111,116
3       102,106,111,114,115
4       237,2,236,102,106,111,114,115,116,117,118,119,125, 126

The result should be dummy data, based on the different values.

row   237  2  236  102  106  111  114  115  116  117 118  119 125  126
1     1    1   1    1    1    1    1    1    1    1   1    1   1   0
2     1    0   0    0    0    1    0    0    1    0   0    0   0   0  
3     0    0   0    1    1    1    1    1    0    0   0    0   0   0
4     0    0   0    1    1    1    1    1    0    0   0    0   0   1

I tried to solve this with the tidyr separate function, in combination with the function "createDummyFeatures" (MLR package). But, I had to name the columns manually (and ideally it should take the name of the value, just as in the example).

Use akrun's answer at the linked question, just skipping the `colnames` part, and `cbind` with your original first column. — A5C1D2H2I1M1N2O1R2T1, Jan 09 '18 at 09:49
Using `library(tidyr)` as in your tag: `mydf %>% mutate(Events = strsplit(as.character(Events), ",")) %>% unnest(Events) %>% distinct(.) %>% spread(Events,Events) %>% mutate_at(.vars=(-1),.funs=funs(if_else(is.na(.),0,1)))` I'd like to answer you (because neither @akrun or the answers in duplicated question used this approach), but unfortunately they marked your question as duplicate. — Scipione Sarlo, Jan 09 '18 at 10:11

akrun · Accepted Answer · 2018-01-09T10:02:01.763

0

We can use the table approach after splitting by , and converting it to a data.frame with stack

table(stack(setNames(strsplit(df1$Event, ","), df1$row))[2:1])

data

df1 <- structure(list(row = 1:4, 
 Events = c("237,2,236,102,106,111,114,115,116,117,118,119,125", 
 "237,111,116", "102,106,111,114,115", 
 "237,2,236,102,106,111,114,115,116,117,118,119,125, 126"
)), .Names = c("row", "Events"), class = "data.frame", row.names = c(NA, 
 -4L))

edited Jan 09 '18 at 10:02

answered Jan 09 '18 at 09:40

akrun

874,273
37
540
662

The solution led to an error. I tried to fix it by changing the last part of the code [2:1], but that didn't work. Error in data.frame(values = unlist(unname(x)), ind, stringsAsFactors = FALSE) : arguments imply differing number of rows: 969732, 0 – R overflow Jan 09 '18 at 09:59
1

@Roverflow Based on your example, it is working fine for me. I added the dput output of the example I used – akrun Jan 09 '18 at 10:01
works! Strange enough, when I convert the table to a data.frame.matrix() or as.data.frame() the columns and rows are shuffled again. – R overflow Jan 09 '18 at 10:16
1

@Roverflow You need `as.data.frame.matrix(table(stack(setNames(strsplit(df1$Event, ","), df1$row))[2:1]))` – akrun Jan 09 '18 at 10:17

Create new columns with dummies based on values

1 Answers1

data

Linked