1

I have this data frame in R.

df <- read.table(text="
nr  first  second
n1  a;a;b  1;1;7
n2  c;c    3;3
n3  d;e    4;4", header=T, stringsAsFactors=FALSE)

And, I want to convert this to a data frame like this.

nr first second
n1 a     1
n1 b     7
n2 c     3
n3 d     4
n3 e     4

For each row, I only want to keep the unique elements in 'first'; and for each unique element I want a separate row.

Further, the columns 'first' and 'second' are linked. For example, 'b' in 'a;a;b' corresponds to 7 in '1;1;7'. I only want to keep the element in 'second' that corresponds to the element in 'first'.

milan
  • 4,782
  • 2
  • 21
  • 39

2 Answers2

4

separate_rows() from tidyr package is the function you are looking for:

library(tidyr)
unique(separate_rows(df, sep = ";", first:second))

#   nr first second
# 1 n1     a      1
# 3 n1     b      7
# 4 n2     c      3
# 6 n3     d      4
# 7 n3     e      4
Psidom
  • 209,562
  • 33
  • 339
  • 356
2

We can use cSplit

library(splitstackshape)
unique(cSplit(df, c("first", "second"), ";", "long"))
#    nr first second
#1: n1     a      1
#2: n1     b      7
#3: n2     c      3
#4: n3     d      4
#5: n3     e      4

Or without using any external packages, we loop through the second and third column, split by the delimiter (strsplit(x, ";")), stack it to data.frame, then cbind the list elements, get the unique rows and change the column names to that of the original dataset.

lst <- lapply(df[2:3], function(x) stack(setNames(strsplit(x, ";"), df[,1])))
setNames(unique(cbind(lst[[1]][2:1], lst[[2]][1])), names(df))
#  nr first second
#1 n1     a      1
#3 n1     b      7
#4 n2     c      3
#6 n3     d      4
#7 n3     e      4
akrun
  • 874,273
  • 37
  • 540
  • 662