How to extend a data.frame based on a column in R?

Question

Suppose I have a data frame such like

df <- data.frame(x=1:3, y=c('a,b,c','c,d', 'f,g'))
df
  x     y
1 1 a,b,c
2 2   c,d
3 3   f,g

What is an easy way to make it like

 df
  x y
1 1 a
2 1 b
3 1 c
4 2 c
5 2 d
6 3 f
7 3 g

e.g extend the column y to multiple rows for each element of x.

akrun · Accepted Answer · 2021-07-06T18:37:20.063

Easier option is separate_rows

library(tidyr)
library(dplyr)
out <- df %>% 
   separate_rows(y)

-output

out
# A tibble: 7 x 2
      x y    
  <int> <chr>
1     1 a    
2     1 b    
3     1 c    
4     2 c    
5     2 d    
6     3 f    
7     3 g

If we need to do the reverse operation as well, there are couple of ways.

From the output, do a group by + summarise operation

out %>%
    group_by(x) %>%
    summarise(y = toString(y), .groups = 'drop')
# A tibble: 3 x 2
      x y      
  <int> <chr>  
1     1 a, b, c
2     2 c, d   
3     3 f, g

toString adds a space after the ,. If we don't want the space, use paste(y, collapse=",") or stringr::str_c(y, collapse=",")

Another way is to replicate the column 'y' and select that column along with 'x' and get the distinct. This should also work when there is a single column i.e. no 'x' column (thus we don't know how it was split)

out <- df %>% 
        mutate(y1 = y) %>%
        separate_rows(y1) 
distinct(out, x, y)
# A tibble: 3 x 2
      x y    
  <int> <chr>
1     1 a,b,c
2     2 c,d  
3     3 f,g

Thank you @akrun. Is there a reverse function of separate_rows(). E.g. from the output df to the original df? — David Z, Jul 06 '21 at 18:29
@DavidZ you can do a group by operation i.e. `out %>% group_by(x) %>% summarise(y = toString(y))` — akrun, Jul 06 '21 at 18:30
I think a better approach would be to replicate the column 'y' before the separate_rows i.e. `df %>% mutate(y1 = y) %>% separate_rows(y1) %>% distinct(x, y)` — akrun, Jul 06 '21 at 18:31

How to extend a data.frame based on a column in R?

1 Answers1