I have a dataset of genes associated with drugs derived from DrugBank. I wish to simply translate all the drugbank IDs to drug names readable by a human. As you can see my main problem is that some genes are linked to multiple or even hundreds of drugs. the multiple Drug IDs are in the same comma-delimited "column" The R studio "match" or "merge function" only work for the first identifier in each column, thus effectively deleting the remainder in the same column "cell". I have found ways to do this manually on excel for my top candidates but it is not realistic for my dataset of 3000 genes.
Ideally, I would like to do something like "text to columns" but in rows, so every row would keep all its other values but only one of the multiple drugbank IDs in the cell, and then can just use the match function to replace them.
The drugbank vocabulary (.csv) looks like this: [DBvocabulary.csv]
DrugBank.ID Common.name
DB00001 Lepirudin
DB00002 Cetuximab
DB00003 Dornase alfa
DB00004 Denileukin diftitox
DB00005 Etanercept
DB00006 Bivalirudin
My dataset (.csv) has 15 columns but the important ones are:
[all_ph_active.csv]
Gene.Name DrugBank.ID
F8 DB09130
TCN2 DB00200
LDLR DB09270; DB11251; DB14003
ALB DB00070; DB00137; DB00159; DB00162; DB00214;
Any advice is welcome, thanks in advance!