I'm working on a large dataset at the moment and so far I could solve all my ideas/problems via countless google searches and long try & error sessions very well. I've managed to use plyr and reshape functions for some transformations of my different datasets and learned a lot, but I think I've reached a point where my present R knowledge won't help me anymore.
Even if my question sounds very specific (i.e. OTU table and fasta file) I guess my attempt is a common R application across many different fields (and not just bioinformatics).
Right now, I have merged an reference sequence file with an abundance table, and I would like to generate a specific file based on the information of this data.frame - a fasta file.
My df looks a bit like this at the moment:
repSeq sw.1.102 sw.3.1021 sw.30.101 sw.5.1042 ...
ACCT-AGGA 3 0 1 0
ACCT-AGGG 1 1 2 0
ACTT-AGGG 0 1 0 25
...
The resulting file should look like this:
>sw.1.102_1
ACCT-AGGA
>sw.1.102_2
ACCT-AGGA
>sw.1.102_3
ACCT-AGGA
>sw.1.102_4
ACCT-AGGG
>sw.3.1021_1
ACCT-AGGG
>sw.3.1021_2
ACTT-AGGG
>sw.30.101_1
ACCT-AGGA
>sw.30.101_2
ACCT-AGGG
...
As you can see I would like to use the information about the number of (reference) sequences for each sample (i.e. sw.n) to create a (fasta) file.
I have no experiences with loops in R (I used basic loops only during simple processing attempts), but I assume this could do the trick here. I have found the write.fasta function from the SeqinR package, but I could not find any solution there. The deunique.seqs command in mothur wont work, because it needs a fasta file as input (which I obviously don't have). It could be very possible that there is something on Bioconductor (OTUbase?), but to be honest, I don't know where to beginn and I'm glad about any help. And I really would like to do this in R, since I enjoy working with it, but any other ideas are also very welcome.
//small edit:
Both answers below work very well (see my comments) - I also found two possible not-so-elegant & non-R workarounds (not tested yet):
- since I already have a taxonomy file and an abundance OTU table, I think the mothur command make.biom could be used to create a biom-format file. I haven't worked with biom files yet, but I think there are some tools and scripts available to save the biom-file data as fasta again
- convert Qiime files to oligotyping format - this also needs a taxonomy file and an Otu table
Not sure if both ways work - therefore, please correct me if I'm wrong.