0

I have the following data frame:

> sample
  chrom     start       end score genehancer_id connected_gene score.1 connected_gene.1 score.2 connected_gene.2 score.3 connected_gene.3 score.4
1 chr10 121780088 121780259  0.27   GH10G121780          TACC2    3.94             ATE1    1.38      GC10M121752    0.31      GC10M121821    0.22
2 chr20  54214412  54215291  0.78   GH20G054214          PFDN4    0.43             DOK5    0.06             <NA>    <NA>             <NA>    <NA>

Which I want to melt in such a way that I get this output:

 chrom  start   end score   genehancer_id   connected_gene  score
1 chr10 121780088   121780259   0.27    GH10G121780 TACC2   3.94
2 chr10 121780088   121780259   0.27    GH10G121780 ATE1    1.38
3 chr10 121780088   121780259   0.27    GH10G121780 GC10M121752 0.31
4 chr10 121780088   121780259   0.27    GH10G121780 GC10M121821 0.22
5 chr20 54214412    54215291    0.78    GH20G054214 PFDN4   0.43
6 chr20 54214412    54215291    0.78    GH20G054214 DOK5    0.06 

Here, the first 5 columns stay static, they will go into my melt's id.vars, but I am unable to find how can I restrict the output to a customized paired fashion, so that every successive 2 columns (after the first 5 static columns) are appended in a new row.

Any help will be much appreciated.

Here is my sample input data:

> dput(sample)
structure(list(chrom = c("chr10", "chr20"), start = c(121780088L, 
54214412L), end = c(121780259L, 54215291L), score = c(0.27, 0.78
), genehancer_id = c("GH10G121780", "GH20G054214"), connected_gene = c("TACC2", 
"PFDN4"), score.1 = c("3.94", "0.43"), connected_gene.1 = c("ATE1", 
"DOK5"), score.2 = c("1.38", "0.06"), connected_gene.2 = c("GC10M121752", 
NA), score.3 = c("0.31", NA), connected_gene.3 = c("GC10M121821", 
NA), score.4 = c("0.22", NA)), .Names = c("chrom", "start", "end", 
"score", "genehancer_id", "connected_gene", "score.1", "connected_gene.1", 
"score.2", "connected_gene.2", "score.3", "connected_gene.3", 
"score.4"), class = "data.frame", row.names = c(NA, -2L))
syam
  • 799
  • 1
  • 12
  • 30
Newbie
  • 411
  • 5
  • 18
  • 1
    `library(data.table); melt(setDT(samp), id = 1:5, measure.vars = patterns('connected_gene','score\\.\\d'), value.name = c('connected_gene','scores'), na.rm = TRUE)` – Jaap Mar 16 '18 at 10:18
  • This solved my problem. Thanks a lot for your help. – Newbie Mar 16 '18 at 10:23

0 Answers0