I'm new to programming in R and trying to do a very specific task.
I have a fasta sequence of n samples, which I read in ape
:
library(ape)
matrix <- read.dna(myfasta, format="fasta", as.character=TRUE)
This created a matrix, like so:
| | V1 | V2 | V3 | V4 |...
|------------------------|
|Seq1| a | t | g | c |...
|Seq2| a | t | g | a |...
|Seq3| a | t | c | c |...
|Seq4| t | t | g | a |...
|... |
Where Seq(n) is the DNA sequence for each sample, and V(n) denotes nucleotide position.
How can I select the sequences that bear a certain nucleotide (e.g. "a"), at a certain position (e.g. "V1"), and then return the sequences as a concatenated string?
So for position V1, I'd want to have something like "Seq1, Seq2, Seq3" and for position V4, for the same base, I'd want to have "Seq2, Seq4"
I've tried which()
and filter(matrix, V1 == "a")
but I'm struggling.
Thanks in advance!