0

Suppose I have the following dataframe, called 'example':

a <- c("rs123|rs246|rs689653", "rs9753", "rs00334")
b <- c(1,2,9)
c <- c(234534523, 67345634, 536423)

example <- data.frame(a,b,c)

I want the dataframe to look like this:

                a b         c
            rs123 1 234534523
            rs246 1 234534523
         rs689653 1 234534523
           rs9753 2  67345634
          rs00334 9    536423

Where if we split column a on the | delimiter, the other columns are duplicated. Any help would be greatly appreciated!!

Sheila
  • 2,438
  • 7
  • 28
  • 37

1 Answers1

3

We can use separate_rows from the tidyr package (part of the tidyverse package).

library(tidyverse)

example2 <- example %>%
  separate_rows(a)
example2
#          a b         c
# 1    rs123 1 234534523
# 2    rs246 1 234534523
# 3 rs689653 1 234534523
# 4   rs9753 2  67345634
# 5  rs00334 9    536423

Here is one way to convert example2 back to the original format.

example3 <- example2 %>%
  group_by(b, c) %>%
  summarize(a = str_c(a, collapse = "|")) %>%
  ungroup() %>%
  select(names(example2)) %>%
  mutate(a = factor(a)) %>%
  as.data.frame()

identical(example, example3)
# [1] TRUE
www
  • 38,575
  • 12
  • 48
  • 84
  • Thanks! Just for completeness and curiosity's sale, lets say I wanted to go from 'example2' back to something like 'example'. What's the best way to do that? – Sheila Nov 01 '18 at 00:27