Denormalize list in a data frame in R

Question

I have a data frame that looks like this (from mongo db..)

team_id <- c(1,2)
member <- c("15,25,35","12,22,32")
data.frame (team_id,member)

which I'm trying to convert data frame like this..

team_id2 <- c(1,1,1,2,2,2)
member2 <- c(15,25,35,12,22,32)
data.frame (team_id2, member2)

I tried to use 'unlist' but cannot get "team_id" to be repeated every column. I will appreciate any guidance on this!

bgoldst · Answer 1 · 2016-05-10T06:41:21.267

6

Here's a base R solution:

ms <- strsplit(as.character(df$member),',');
df2 <- df[rep(seq_len(nrow(df)),sapply(ms,length)),];
df2$member <- unlist(ms);
df2;
##     team_id member
## 1         1     15
## 1.1       1     25
## 1.2       1     35
## 2         2     12
## 2.1       2     22
## 2.2       2     32

You can replace sapply(ms,length) with lengths(ms) if you're using a recent enough version of R.

edited May 10 '16 at 06:41

answered May 10 '16 at 06:35

bgoldst

34,190
6
38
64

akrun · Accepted Answer · 2016-05-10T09:48:56.153

We can use cSplit from library(splitstackshape). It is easy and compact to work with cSplit for these kind of problems. We just provide the column to split i.e. member, the delimiter (,) and the direction (long).

library(splitstackshape)
cSplit(d1, "member", sep=",", "long")
#    team_id member
#1:       1     15
#2:       1     25
#3:       1     35
#4:       2     12
#5:       2     22
#6:       2     32

or using data.table, we convert the 'data.frame' to 'data.table' (setDT(d1)), grouped by 'team_id', we split the 'member' by , and unlist the output.

library(data.table)
setDT(d1)[, .(member=unlist(tstrsplit(member, ","))), team_id]
#   team_id member
#1:       1     15
#2:       1     25
#3:       1     35
#4:       2     12
#5:       2     22
#6:       2     32

Or using tidyr, we can split the 'member' by , and unnest (from tidyr)

library(tidyr)
library(stringr)
unnest(d1[1], member= str_split(d1$member, ","))
#Source: local data frame [6 x 2]

#  team_id member
#   (dbl)  (chr)
#1       1     15
#2       1     25
#3       1     35
#4       2     12
#5       2     22
#6       2     32

Or we can use a base R solution. We use strsplit to split the 'member' column into a list, set the names of the list as 'team_id' and use stack to convert the list to data.frame

stack(setNames(strsplit(as.character(d1$member), ","), d1$team_id))[2:1]

data

d1 <- data.frame (team_id,member)

score 2 · Answer 3 · answered May 10 '16 at 07:09

To throw in a tidyr solution:

library(tidyr)
team_id <- c(1,2)
member <- c("15,25,35","12,22,32")
old <- data.frame (team_id, member, stringsAsFactors = FALSE)

## need to determine how many items there are at most in column 'member'
maxItems <- max(sapply(strsplit(old$member, ","), length))

old %>% separate(member, seq_len(maxItems), ",") %>% 
        gather(position, member, -team_id) 
#   team_id position member
# 1       1        1     15
# 2       2        1     12
# 3       1        2     25
# 4       2        2     22
# 5       1        3     35
# 6       2        3     32

Denormalize list in a data frame in R

3 Answers3

data

Linked