How to custom flatten a data frame?

Question

I have a data frame as follows:

df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3))
df

> df
      x y
1 a,b,c 1
2   d,e 2
3     f 3

I can get the flattened df$x like this:

unique(unlist(strsplit(as.character(df$x), ",")))
[1] "a" "b" "c" "d" "e" "f"

What would be the best way to transform my input df into:

 x y
 a 1
 b 1 
 c 1
 d 2
 e 2
 f 3

Basically flatten df$x and individually assign its corresponding y

score 2 · Answer 1 · answered Aug 23 '17 at 21:54

2

If you are working on data.frame, I recommend using tidyr

df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3),stringsAsFactors = F)
library(tidyr)
df %>%
    transform(x= strsplit(x, ",")) %>%
    unnest(x)


  y x
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 3 f

answered Aug 23 '17 at 21:54

BENY

317,841
20
164
234

3

Also, using tidyr, there's a `separate_rows` function that is quite handy: `separate_rows(df,x)` – Lamia Aug 23 '17 at 22:02

d.b · Accepted Answer · 2017-08-23T21:56:06.973

1

sapply(unlist(strsplit(as.character(df$x), ",")), function(ss)
    df$y[which(grepl(pattern = ss, x = df$x))])
#a b c d e f 
#1 1 1 2 2 3

If you want a dataframe

do.call(rbind, lapply(1:NROW(df), function(i)
    setNames(data.frame(unlist(strsplit(as.character(df$x[i]), ",")), df$y[i]),
             names(df))))
#  x y
#1 a 1
#2 b 1
#3 c 1
#4 d 2
#5 e 2
#6 f 3

edited Aug 23 '17 at 21:56

answered Aug 23 '17 at 21:44

d.b

32,245
6
36
77

score 1 · Answer 3 · answered Aug 23 '17 at 22:11

FWIW, you could also repeat the row indices according to how many elements each x value has:

df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3),stringsAsFactors = F)
df[,1] <- strsplit(df[,1],",")
cbind(x=unlist(df[,1]),df[rep(1:nrow(df), lengths(df[,1])),-1,F])
#     x y
# 1   a 1
# 1.1 b 1
# 1.2 c 1
# 2   d 2
# 2.1 e 2
# 3   f 3

How to custom flatten a data frame?

3 Answers3

Linked