-1

I would like to split each column in the 4 the row from the input data to separate column one below the other as shown in the expert output

input

 cytoband   11qE2         1qC1.1      13qD2.1
    q value     1.16          1.53        1.13
    wide      11:119210       1:50490     13:107190
    genes    Aatk,Actg1,Alyref Tin,Ern    Alk,Nf12

expected output

cytoband    11qE2         1qC1.1      13qD2.1
q value     1.16          1.53        1.13
wide      11:119210       1:50490     13:107190
genes    Aatk             Tin         Alk
         Actg1            Ern         Nf12
         Alyref           
beginner
  • 411
  • 1
  • 5
  • 13
  • 1
    You should transpose your data; your variables are horizontal instead of vertical, which doesn't work well in a data.frame. Once you do that, you need to decide how you want your data arranged. You can repeat the other data, insert `NA`s in fringed columns (not recommended unless order of genes is meaningful), or use a list column (the most efficient option, but requires a little skill to manipulate effectively). – alistaire May 03 '17 at 21:16

1 Answers1

1

I think what you want is the separate_rows in the tidyr package.

There is an example right in the documentation:

 df <- data.frame(
   x = 1:3,
   y = c("a", "d,e,f", "g,h"),
   z = c("1", "2,3,4", "5,6"),
   stringsAsFactors = FALSE
 )
separate_rows(df, y, z, convert = TRUE)

If you use the standard eval version of separate_rows_ you can use column names, which could get you something like this:

names <- colnames(df)
for(col in names) {
 df <- separate_rows_(df, col, sep = ",", convert = FALSE)
}

Not perfect because it causes the values to repeat, but maybe something to start with?

kpress
  • 136
  • 6
  • @Knachman..good to know about the separate_rows() function. in my case i have 139 column. so i was trying to do; separate_rows(dm1, V1:V139,convert = TRUE); but getting error Error: All nested columns must have the same number of elements. – beginner May 03 '17 at 20:45
  • You're right, that is an annoying issue! My first thought was to try and put it in a loop, I edited above with something that my help get you started. – kpress May 03 '17 at 21:48
  • Using a `for` loop would basically create the cartesian product of the split variables, which probably isn't desired. For example, `df <- data.frame(x = 1:3, y = c("a", "d,e", "g,h"), z = c("1", "2,3,4", "6"))` should result in 6 rows as the parallel maximum number items in each row is 1, 3, and 2. However, the `for` loop would result in 1 + 2x3 + 2 = 9 rows. @beginner, [the development version of "splitstackshape"](https://github.com/mrdwab/splitstackshape/tree/v2.0) should be able to handle this with `cSplit(df, 2:3, ",", "long")` (or `cSplit(dm1, paste0("V", 1:139), ",", "long")`). – A5C1D2H2I1M1N2O1R2T1 Mar 31 '18 at 11:06