I have a hard time wrapping my head around how to ask this question, basically this is my data:
Human_gene_set.gs_name Human_gene_set.entrez_gene
1 ABBUD_LIF_SIGNALING_1_DN 79026
2 ABBUD_LIF_SIGNALING_1_DN 214
206 ABE_VEGFA_TARGETS_2HR 348
207 ABE_VEGFA_TARGETS_2HR 6795
475 ACEVEDO_LIVER_CANCER_DN 595
476 ACEVEDO_LIVER_CANCER_DN 975
(There are other gs_names further down)
I need to make a new dataframe where the columns are the gs_names, and every column has a list of entrez_genes. So that there are no duplicate gs_names.
Like this:
HALLMARK_PANCREAS_BETA_CELLS
[1] "11925" "12652" "13193" "13482" "14378" "14526" "15376" "15874" "16334" "16392" "16658"
[12] "16909" "18012" "18088" "18096" "18481" "18506" "18508" "18548" "18549" "18609" "18770"
[23] "20526" "20604" "20813" "20818" "20910" "20927" "21405" "22337" "23797" "27058" "53626"
[34] "56458" "56529" "69019" "77766" "80976" "103988" "214189
I am stuck, help me.