1

Suppose I have a data frame such like

df <- data.frame(x=1:3, y=c('a,b,c','c,d', 'f,g'))
df
  x     y
1 1 a,b,c
2 2   c,d
3 3   f,g

What is an easy way to make it like

 df
  x y
1 1 a
2 1 b
3 1 c
4 2 c
5 2 d
6 3 f
7 3 g

e.g extend the column y to multiple rows for each element of x.

David Z
  • 6,641
  • 11
  • 50
  • 101

1 Answers1

1

Easier option is separate_rows

library(tidyr)
library(dplyr)
out <- df %>% 
   separate_rows(y)

-output

out
# A tibble: 7 x 2
      x y    
  <int> <chr>
1     1 a    
2     1 b    
3     1 c    
4     2 c    
5     2 d    
6     3 f    
7     3 g    

If we need to do the reverse operation as well, there are couple of ways.

  1. From the output, do a group by + summarise operation
out %>%
    group_by(x) %>%
    summarise(y = toString(y), .groups = 'drop')
# A tibble: 3 x 2
      x y      
  <int> <chr>  
1     1 a, b, c
2     2 c, d   
3     3 f, g   

toString adds a space after the ,. If we don't want the space, use paste(y, collapse=",") or stringr::str_c(y, collapse=",")

  1. Another way is to replicate the column 'y' and select that column along with 'x' and get the distinct. This should also work when there is a single column i.e. no 'x' column (thus we don't know how it was split)
out <- df %>% 
        mutate(y1 = y) %>%
        separate_rows(y1) 
distinct(out, x, y)
# A tibble: 3 x 2
      x y    
  <int> <chr>
1     1 a,b,c
2     2 c,d  
3     3 f,g  
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you @akrun. Is there a reverse function of separate_rows(). E.g. from the output df to the original df? – David Z Jul 06 '21 at 18:29
  • 1
    @DavidZ you can do a group by operation i.e. `out %>% group_by(x) %>% summarise(y = toString(y))` – akrun Jul 06 '21 at 18:30
  • I think a better approach would be to replicate the column 'y' before the separate_rows i.e. `df %>% mutate(y1 = y) %>% separate_rows(y1) %>% distinct(x, y)` – akrun Jul 06 '21 at 18:31