R Split delimited strings in a column and insert as new column (in binary)

Question

I have data frame as below

+---+-----------+
|lot|Combination|
+---+-----------+
|A01|A,B,C,D,E,F|
|A01|A,B,C      |
|A02|B,C,D,E    |
|A03|A,B,D,F    |
|A04|A,C,D,E,F  |
+---+-----------+

Each of the alphabet is a character separated by comma, I would like to split 'Combination' on each comma and insert the split strings as new column, in binary form. For instance, the desired output will be:

+---+-+-+-+-+-+-+
|lot|A|B|C|D|E|F|  
+---+-+-+-+-+-+-+
|A01|1|1|1|1|1|1|
|A01|1|1|1|0|0|0|
|A02|0|1|1|1|1|0|
|A03|1|1|0|1|0|1|
|A04|1|0|1|1|1|1|
+---+-+-+-+-+-+-+

Any help will be appreciated :)

while I check if my answer is really working, have a look at this post: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and edit your question and exchange the images for real data please :) — Jan, Jul 31 '17 at 04:27
`library(splitstackshape); cSplit_e(df, "Combination", type = "character", fill = 0)` should do it.... — A5C1D2H2I1M1N2O1R2T1, Jul 31 '17 at 07:15

score 3 · Accepted Answer · answered Jul 31 '17 at 05:00

A solution using dplyr and tidyr. dt2 is the final output.

# Load packages
library(dplyr)
library(tidyr)

# Create example data frame
dt <- lot <- c("A01", "A01", "A02", "A03","A04")
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,B,D,F", "A,C,D,E,F")
dt <- data_frame(lot, Combination)

# Process the data
dt2 <- dt %>%
  mutate(ID = 1:n()) %>%
  mutate(Combination = strsplit(Combination, split = ",")) %>%
  unnest() %>%
  mutate(Value = 1) %>%
  spread(Combination, Value, fill = 0) %>%
  select(-ID)

Thanks! it works!! The results is exactly what i'm looking for :) — yc.koong, Jul 31 '17 at 07:08

score 1 · Answer 2 · edited Sep 15 '17 at 00:00

Please provide your sample input data in a form that is directly usable as input for the person answering. I added same sample data myself here. hope that helps.

library(tidyr)
library(dplyr)
lot <- c("A01", "A02", "A03","A04")
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,C")
df <- data.frame(lot, Combination)
df

separate(df, Combination, into=paste("V",1:6, sep=""), sep=",") %>%
    gather(key, value,-lot) %>%
    filter(!is.na(value)) %>%
    mutate(yesno = 1) %>%
    distinct %>%
    spread(value, yesno, fill = 0) %>% select(-key)

To understand what's happening here, run every step starting from separate() individually. The %>% is a pipe operator that is shorthand for adding the result of the previous row as the first parameter of the next row.

score 0 · Answer 3 · answered Jul 31 '17 at 05:08

another option, using the handy separate_rows() function:

df <- read.table( text = "lot|Combination
A01|A,B,C,D,E,F
A01|A,B,C
A02|B,C,D,E
A03|A,B,D,F    
A04|A,C,D,E,F", sep ="|", header = TRUE) 

library(tidyverse)
df %>%
  mutate(id = row_number(), flg = 1) %>%
  separate_rows(Combination, sep = ",") %>%
  spread(Combination, flg)

gives:

  lot id  A  B  C  D  E  F
1 A01  1  1  1  1  1  1  1
2 A01  2  1  1  1 NA NA NA
3 A02  3 NA  1  1  1  1 NA
4 A03  4  1  1 NA  1 NA  1
5 A04  5  1 NA  1  1  1  1

R Split delimited strings in a column and insert as new column (in binary)

3 Answers3

Linked

Related