How do l extract all genes in gene_symbol into a new column that have the same start and end in r

Question

l have a dataframe like, l want to have another column name call Gene, where it looks through and pick all genes in gene symbols that have the same fragment or start and end into a new column call Genes as seen below

chr  start    end Fragments CK BB FP i.start  i.end       gene_name            gene_symbol
1:   1 710000 715000   143  0.2662  1  0.0138   91421 762886 ENSG00000225880      LINC00115
2:   1 710000 715000   143  0.2662  1  0.0138   91421 762886 ENSG00000240453 RP11-206L10.10
3:   1 710000 715000   143  0.2662  1  0.0138  676386 762886 ENSG00000228327  RP11-206L10.2
4:   1 710000 715000   143  0.2662  1  0.0138  714172 740255 ENSG00000237491  RP11-206L10.9
5:   1 720000 725000   145  0.0000  0  0.0000   91421 762886 ENSG00000225880      LINC00115
6:   1 720000 725000   145  0.0000  0  0.0000   91421 762886 ENSG00000240453 RP11-206L10.10

l want it to be like this

chr  start    end Fragments CK BB FP i.start  i.end           Genes
1:   1 710000 715000   143  0.2662  1  0.0138   91421 762886      LINC00115,RP11-206L10.10,RP11-206L10.2,RP11-206L10.9
2:   1 720000 725000   145  0.0000  0  0.0000   91421 762886    LINC00115,RP11-206L10.10

I'm removing those tags. @Jerry If they are actually relevant, re-add them and explain why — Luis Mendo, Aug 01 '20 at 16:12

score 0 · Accepted Answer · answered Aug 01 '20 at 15:09

0

We can do a group by paste

library(data.table)
dt[, .(Genes = toString(gene_symbol)),
     .(chr, start, end, Fragments, CK, BB, i.start, i.end)]

answered Aug 01 '20 at 15:09

akrun

874,273
37
540
662

your code worked but not completely, if you look at the start and end, you have four, the fragment are 4, but what l want is to keep one common id and then have all the genes that correspond to each similar start and end together in one separated by comma. But now l want to get rid of i.start and i.end. Just like below chr start end Fragments CK BB FP Genes 1: 1 710000 715000 143 0.2662 1 0.0138 LINC00115,RP11-206L10.10,RP11-206L10.2,RP11-206L10.9 – Jerry Aug 02 '20 at 10:03

How do l extract all genes in gene_symbol into a new column that have the same start and end in r

1 Answers1