4

I'm working with a very raw set of data and need to shape it up in order to work with it. I am trying to split selected columns based on seperator '|'

d <- data.frame(id = c(022,565,893,415),
     name = c('c|e','m|q','w','w|s|e'), 
     score = c('e','k|e','e|k|e', 'e|o'))

Is it possible to split the dataframe at one so it looks like this in the end.

df <- data.frame(id = c(22,22,565,565,565,565,893,893,893,415,415,415,415,415,415),
            name = c('c','e','m','m','q','q','w','w','w','w','w','s','s','e','e'),
            score = c('e','e','k','e','k','e','e','k','e','e','o','e','o','e','o'))

So far I've tried various different string split funtions but haven't had much luck :(

Can anybody help?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Davis
  • 466
  • 4
  • 20
  • Also, http://stackoverflow.com/questions/13773770 – zx8754 Nov 14 '16 at 15:00
  • 1
    Just use a double `cSplit`: `library(splitstackshape); cSplit(cSplit(d, 'name', sep = '|', 'long'), 'score', sep = '|', 'long')` – Jaap Nov 14 '16 at 15:03
  • 1
    [V2 of "splitstackshape"](https://github.com/mrdwab/splitstackshape/tree/v2.0) will introduce a function called `cartesian_split()` which can be used as `cartesian_split(d, c("name", "score"), "[|]", fixed = FALSE)`. – A5C1D2H2I1M1N2O1R2T1 Mar 31 '18 at 11:16

3 Answers3

7

Here's a simple base R approach in two steps:

1) split the columns:

x <- lapply(d[-1], strsplit, "|", fixed = TRUE)

2) expand and combine:

d2 <- setNames(do.call(rbind, Map(expand.grid, d$id, x$name, x$score)), names(d)) 

The result is then:

#    id name score
#1   22    c     e
#2   22    e     e
#3  565    m     k
#4  565    q     k
#5  565    m     e
#6  565    q     e
#7  893    w     e
#8  893    w     k
#9  893    w     e
#10 415    w     e
#11 415    s     e
#12 415    e     e
#13 415    w     o
#14 415    s     o
#15 415    e     o
talat
  • 68,970
  • 21
  • 126
  • 157
4

There is also 2 line tidyr \ dplyr solution.

The tidyr package has a function called separate_rows that will do what you need. You need two separate the rows in two operations with the nested elements not being equal.

library(tidyr)
library(dplyr)

df <- separate_rows(d, name, sep = "\\|") %>%
separate_rows(score, sep = "\\|")
talat
  • 68,970
  • 21
  • 126
  • 157
Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36
1

With dplyr and unnest from tidyr:

library(dplyr)
library(tidyr)
d %>% mutate(name=strsplit(name,split="[|]")) %>% 
      group_by(id) %>% 
      unnest() %>% 
      mutate(score=strsplit(score,split="[|]")) %>% 
      unnest()
##Source: local data frame [15 x 3]
##Groups: id [4]
##
##      id  name score
##   <dbl> <chr> <chr>
##1     22     c     e
##2     22     e     e
##3    565     m     k
##4    565     m     e
##5    565     q     k
##6    565     q     e
##7    893     w     e
##8    893     w     k
##9    893     w     e
##10   415     w     e
##11   415     w     o
##12   415     s     e
##13   415     s     o
##14   415     e     e
##15   415     e     o
aichao
  • 7,375
  • 3
  • 16
  • 18