0

My dataframe looks like this:

V1            V2 
colors1       black;yellow;green 
colors2       blue;pink;purple 

I am trying to transform this df into a frequency matrix using dcast: dcast(df, V2~V1) but I need to split the second column strings into separate values like this:

V1            V2 
colors1       black
colors1       yellow
colors1       green 
colors2       blue
colors2       pink
colors2       purple 

Is there an easy way to do this?

Munrock
  • 403
  • 1
  • 11

2 Answers2

2

Using separate_rows from the tidyr package:

df <- data.frame(V1=c('colors1', 'colors2'), V2=c('black;yellow;green', 'blue;pink;purple'))
  
tidyr::separate_rows(df, V2)

#> # A tibble: 6 × 2
#>   V1      V2    
#>   <chr>   <chr> 
#> 1 colors1 black 
#> 2 colors1 yellow
#> 3 colors1 green 
#> 4 colors2 blue  
#> 5 colors2 pink  
#> 6 colors2 purple
Aron Strandberg
  • 3,040
  • 9
  • 15
2

Another simple option is using strsplit like this:

df <- read.table(text="V1            V2 
colors1       black;yellow;green 
colors2       blue;pink;purple ", header = TRUE)

library(dplyr)
library(tidyr)
df %>% 
  mutate(V2 = strsplit(V2, ";")) %>% 
  unnest(V2)
#> # A tibble: 6 × 2
#>   V1      V2    
#>   <chr>   <chr> 
#> 1 colors1 black 
#> 2 colors1 yellow
#> 3 colors1 green 
#> 4 colors2 blue  
#> 5 colors2 pink  
#> 6 colors2 purple

Created on 2022-07-11 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53