I have a list of almost 4k amino acid sequences of different length of the same protein, and I would like to find any differences (mutations) which are either missense or frameshift.
To be clear, I am starting from this:
seqs <- c("FLGKIWPSYKGRPGNF", "FLGKIWPSHKGRPGNF", "FLGRIWPSHKGRPGNF", "FLGKIWPSHKGRPGNF", "FLGKIWPSHKGRPGNF", "FLGKVWPSHKGRPGNF", "FLGKVWPSHKGRPGNF", "FLGKIWPSHKGRPGNF", "FLGKIWPSHKGRPGN", "FLGKIWPSQNKGRPGNF")
ref <- seqs[1]
(except that seqs is an object of the AAstrings class, and ref too)
For missense mutations I found a very helpful code here on stack overflow (Identifying amino acid substitutions from local alignments in R).
However, this code does not identify any deletion or insertion in the sequence, just comparing ref and queries when they are of the same length. When they are not, it does not identify where the alignment is lost (I would like to know if there is a deletion or an insertion which is causing it).
To be clearer, I would like to get something like this as a result:
#> ID Reference_AA Sample_AA Pos
...
#>15 query9 F - 16
...
#>30 query10 H QN 8
or
#>30 query10 - Q 8
#>31 query10 H N 9