2

I have a data frame which is having multiple columns each column is having string values. I want to split column values by coma separator in the output data frame. Input and required output are as below

Col1=c("a,b,c","9,a,5")
Col2=c("c,b,e","4,r,t")
Col3=c("e,f,g","y,z,d")
Input=data.frame(Col1,Col2,Col3)

Column1=c("a","9")
Column2=c("b","a")
Column3=c("c","5")
Column4=c("c","4")
Column5=c("b","r")
Column6=c("e","t")
Column7=c("e","y")
Column8=c("f","z")
Column9=c("g","d")
Output=data.frame(Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9)

If anyone know the solution please help.

3 Answers3

1

The package splitstackshape is made for such operations,

library(splitstackshape)

cSplit(Input, names(Input), type.convert = 'as.is')

#   Col1_1 Col1_2 Col1_3 Col2_1 Col2_2 Col2_3 Col3_1 Col3_2 Col3_3
#1:      a      b      c      c      b      e      e      f      g
#2:      9      a      5      4      r      t      y      z      d
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Ya this works. Answer is useful. Thank you – ParthaSarathi May 20 '22 at 12:23
  • How do I do it If I have to split it before some character. – ParthaSarathi May 23 '22 at 06:57
  • There is an argument called `sep` (default is a comma) where you can define your delimiter , i.e. `cSplit(Input, names(Input), sep = 'whatever' ,type.convert = 'as.is')` – Sotos May 23 '22 at 07:03
  • It is like I have few values as comma and space in them I should avoid those and use only values without space to split into columns. ex: "a,b, c,d", the result would split into Column1=c("a") Column2=c("b, c") and Column3=c("d"). Or I can give separate into columns before some character like "P/N" – ParthaSarathi May 23 '22 at 07:08
  • You can replace all `, ` with `,` and split on that. Check my answer [here](https://stackoverflow.com/a/37305413/5635580) – Sotos May 23 '22 at 07:11
  • What if I have NA in any of the column value? – ParthaSarathi May 30 '22 at 06:36
  • You need to specify everything in your question and not ask additional stuff that change the whole approach in comments. – Sotos May 30 '22 at 06:51
0

Sotos already provided a better suggestion, but here's an alternative dplyr/tidyr solution:

library(dplyr)
library(tidyr)
  
Input %>%
  mutate(id = row_number()) %>%
  pivot_longer(-id) %>%
  separate(value, c('1', '2', '3')) %>%
  pivot_wider(names_from=name, names_glue = "{name}_{.value}", values_from = '1':'3') %>%
  select(-id)

#> # A tibble: 2 × 9
#>   Col1_1 Col2_1 Col3_1 Col1_2 Col2_2 Col3_2 Col1_3 Col2_3 Col3_3
#>   <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
#> 1 a      c      e      b      b      f      c      e      g     
#> 2 9      4      y      a      r      z      5      t      d
Aron Strandberg
  • 3,040
  • 9
  • 15
0

With base R using read.csv

read.csv(text = do.call(paste, c(Input, sep = ",")), header = FALSE)
  V1 V2 V3 V4 V5 V6 V7 V8 V9
1  a  b  c  c  b  e  e  f  g
2  9  a  5  4  r  t  y  z  d
akrun
  • 874,273
  • 37
  • 540
  • 662