Context: I am working with genes and ontology, but my question concerns R script writing.
I would like to replace the GO:ID in my data frame by their corresponding terms extracted form a database.
So, this is my source data frame. it is a genes list (v1) and associated GO:ID (v2):
>gene_list_and_Go_ID
V1 V2
2563 Gene1 GO:0003871, GO:0008270, GO:0008652, GO:0009086
2580 Gene2 GO:0003871, GO:0008270, GO:0008652, GO:0009086
12686 Gene3 GO:0003871, GO:0008270, GO:0008652, GO:0009086
14523 Gene4 GO:0004489, GO:0006555, GO:0055114
The request to the database looks very simple:
>select(GO.db, my_Go_id, "TERM", "GOID")
I tried the following lines to address manually the database, it worked well:
>my_Go_id = unlist(strsplit("GO:0008270, GO:0008652, GO:0009086", split=", "))
>select(GO.db, my_Go_id, "TERM", "GOID")
GOID TERM
1 GO:0008270 zinc ion binding
2 GO:0008652 cellular amino acid biosynthetic process
3 GO:0009086 methionine biosynthetic process
My problem: I cannot make this process automatic! Precisely, for each row, I need to transform each string from column n°2 in my data frame to a vector in order to question the database. And then I need to replace the GO:ID in the data frame by the result of the request.
1/ To start, I tried to put the "unlist" function in a "apply" function to my data frame:
apply(gene_list_and_Go_ID,1,unlist(strsplit(gene_list_and_Go_ID[,2], split=", ")))
I got :
Error in strsplit(ok, split = ", ") : non-character argument
2/ Then, can I add also the request to the database inside the apply function?
3/ Finally, I do not know how to replace column n°2 by the result of the database request.
This is an example of an excepted “ideal” result:
V1 V2
2563 Gene1 GOID TERM
1 GO:0008270 zinc ion binding
2 GO:0008652 cellular amino acid biosynthetic process
3 GO:0009086 methionine biosynthetic process
Thanks for your help.