How to group duplicate row names and make a list of their IDs

Question

I have a hard time wrapping my head around how to ask this question, basically this is my data:

Human_gene_set.gs_name Human_gene_set.entrez_gene
1              ABBUD_LIF_SIGNALING_1_DN                      79026
2              ABBUD_LIF_SIGNALING_1_DN                        214
206               ABE_VEGFA_TARGETS_2HR                        348
207               ABE_VEGFA_TARGETS_2HR                       6795
475             ACEVEDO_LIVER_CANCER_DN                        595
476             ACEVEDO_LIVER_CANCER_DN                        975

(There are other gs_names further down)

I need to make a new dataframe where the columns are the gs_names, and every column has a list of entrez_genes. So that there are no duplicate gs_names.

Like this:

HALLMARK_PANCREAS_BETA_CELLS
 [1] "11925"  "12652"  "13193"  "13482"  "14378"  "14526"  "15376"  "15874"  "16334"  "16392"  "16658" 
[12] "16909"  "18012"  "18088"  "18096"  "18481"  "18506"  "18508"  "18548"  "18549"  "18609"  "18770" 
[23] "20526"  "20604"  "20813"  "20818"  "20910"  "20927"  "21405"  "22337"  "23797"  "27058"  "53626" 
[34] "56458"  "56529"  "69019"  "77766"  "80976"  "103988" "214189

I am stuck, help me.

This example is hard to understand because (a) you say you want the output to be a data frame, but you just show a vector, and (b) your sample input only has one unique value of `Human_gene_set.gs_name`. Could you modify your example to simplify and clarify - maybe include 2 or 3 rows each for 2 different `Human_gene_set.gs_name` values, and make sure the output format matches what you want for that input? — Gregor Thomas, Jul 01 '22 at 15:03
But maybe you just want `with(your_data, split(Human_gene_set.entrez_gene, Human_gene_set.gs_name))`? This will be a `list`, not a `data.frame`, but it's my best guess based on what you have so far. — Gregor Thomas, Jul 01 '22 at 15:04

How to group duplicate row names and make a list of their IDs

0 Answers0