My problem is actually related to bioinformatics and genetics but I see that it may be interesting for other programmers too.
As a background, I have lists of mutations, one file per patient sample which means that I have about two hundred individual files. I want to combine these lists and then compare these mutations between different patient groups.
All input files are in following list format;
#Variants in patient A:
Variant1 0.5
Variant2 0.7
#Variants in patient B:
Variant2 0.3
Variant3 0.6
#Variants in patient C:
Variant4 0.5
My problem is that all files do not contain same variables as variants may be unique and be presented only in one file. I would like to summarize these files and generate following output file;
Patient A Patient B Patient C
Variant1 0.5 <NA> <NA>
Variant2 0.7 0.3 <NA>
Variant3 <NA> 0.6 <NA>
Variant4 <NA> <NA> 0.5
What I am asking is some tips how to generate this kind of output file in R, which I am the most familiar with. Any example scripts etc. would be highly appreciated!
THank you for your help!