I am new to programming and have just started learning R and hence a request to please bear with my ignorance. I am currently working with data that looks like the following:
I have data in the following format.For eg:
Disease Gene Symbol
Disease A FOXJ1
Disease B MYB
Disease B GATA4
Disease C MYB
Disease D GATA4
There are some 250 such entries. I would like to see the data in the following format:
Disease 1 Common Shared Gene Symbols Disease 2
Disease A MYB,FOXJ1 Disease B
Disease C MYB Disease B
Disease B GATA4 Disease D
The way I was approaching this : I split the process into 3 steps:
Step 1: Make pairwise combinations of the Diseases.
Step 2: Find gene symbols that are associated with each Disease and assign them to a vector.
Step 3: Now use the intersect (%n%) function on these created vectors to find shared gene symbols.
I am sure there must be something much simpler than this.
Any help will be appreciated! Thank you very much!
Regards, S