1

l have a dataframe like, l want to have another column name call Gene, where it looks through and pick all genes in gene symbols that have the same fragment or start and end into a new column call Genes as seen below

chr  start    end Fragments CK BB FP i.start  i.end       gene_name            gene_symbol
1:   1 710000 715000   143  0.2662  1  0.0138   91421 762886 ENSG00000225880      LINC00115
2:   1 710000 715000   143  0.2662  1  0.0138   91421 762886 ENSG00000240453 RP11-206L10.10
3:   1 710000 715000   143  0.2662  1  0.0138  676386 762886 ENSG00000228327  RP11-206L10.2
4:   1 710000 715000   143  0.2662  1  0.0138  714172 740255 ENSG00000237491  RP11-206L10.9
5:   1 720000 725000   145  0.0000  0  0.0000   91421 762886 ENSG00000225880      LINC00115
6:   1 720000 725000   145  0.0000  0  0.0000   91421 762886 ENSG00000240453 RP11-206L10.10
                                  

l want it to be like this

chr  start    end Fragments CK BB FP i.start  i.end           Genes
1:   1 710000 715000   143  0.2662  1  0.0138   91421 762886      LINC00115,RP11-206L10.10,RP11-206L10.2,RP11-206L10.9
2:   1 720000 725000   145  0.0000  0  0.0000   91421 762886    LINC00115,RP11-206L10.10
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
Jerry
  • 13
  • 2

1 Answers1

0

We can do a group by paste

library(data.table)
dt[, .(Genes = toString(gene_symbol)),
     .(chr, start, end, Fragments, CK, BB, i.start, i.end)]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • your code worked but not completely, if you look at the start and end, you have four, the fragment are 4, but what l want is to keep one common id and then have all the genes that correspond to each similar start and end together in one separated by comma. But now l want to get rid of i.start and i.end. Just like below chr start end Fragments CK BB FP Genes 1: 1 710000 715000 143 0.2662 1 0.0138 LINC00115,RP11-206L10.10,RP11-206L10.2,RP11-206L10.9 – Jerry Aug 02 '20 at 10:03