I want to identify the position and amino acid change in a specific region of interest in my sequences, and store that information in a table.
Is it possible to do something like this in R, by maybe using the bioconductor package? I have managed to do a simple alignment of the sequences with AlignSeqs() from the DECIPHER package, but I am unable to pull out the differences in the sequences automatically. I start with FASTA files.
I want to end up with something like this:
Isolate ID Reference_AA Sample_AA Pos
1 S T 254
2 T D 200
3 L A 230
I have a reference sequence which is 74 AA long, and I want to look at differences in the query sequences (which are much longer than the reference) compared to the reference, and list the results in a table. The position column is related to the position in the reference sequence and not in the query sequence. I want the first AA in the reference sequence to start at 68, not 1.
I find it difficult to add example sequences for this as they tend to be very long, but here is something shorter to work with (not related to the table above):
>ref
VGRALPDVR
>query1
KSSYLDYAMSVIVGTALPDVRDGLKPVHRRVLY
>query2
ELKSSYLDYAMSVIVGRAAPDVRDGLKPV
Expected output:
ID Reference_AA Sample_AA Pos
query1 R T 70
query2 A L 72