0

I have a data frame as follows:

df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3))
df

> df
      x y
1 a,b,c 1
2   d,e 2
3     f 3

I can get the flattened df$x like this:

unique(unlist(strsplit(as.character(df$x), ",")))
[1] "a" "b" "c" "d" "e" "f"

What would be the best way to transform my input df into:

 x y
 a 1
 b 1 
 c 1
 d 2
 e 2
 f 3

Basically flatten df$x and individually assign its corresponding y

SkyWalker
  • 13,729
  • 18
  • 91
  • 187

3 Answers3

2

If you are working on data.frame, I recommend using tidyr

df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3),stringsAsFactors = F)
library(tidyr)
df %>%
    transform(x= strsplit(x, ",")) %>%
    unnest(x)


  y x
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 3 f
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 3
    Also, using tidyr, there's a `separate_rows` function that is quite handy: `separate_rows(df,x)` – Lamia Aug 23 '17 at 22:02
1
sapply(unlist(strsplit(as.character(df$x), ",")), function(ss)
    df$y[which(grepl(pattern = ss, x = df$x))])
#a b c d e f 
#1 1 1 2 2 3 

If you want a dataframe

do.call(rbind, lapply(1:NROW(df), function(i)
    setNames(data.frame(unlist(strsplit(as.character(df$x[i]), ",")), df$y[i]),
             names(df))))
#  x y
#1 a 1
#2 b 1
#3 c 1
#4 d 2
#5 e 2
#6 f 3
d.b
  • 32,245
  • 6
  • 36
  • 77
1

FWIW, you could also repeat the row indices according to how many elements each x value has:

df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3),stringsAsFactors = F)
df[,1] <- strsplit(df[,1],",")
cbind(x=unlist(df[,1]),df[rep(1:nrow(df), lengths(df[,1])),-1,F])
#     x y
# 1   a 1
# 1.1 b 1
# 1.2 c 1
# 2   d 2
# 2.1 e 2
# 3   f 3
lukeA
  • 53,097
  • 5
  • 97
  • 100