Summary:
I'm beginning with some details about alignment algorithms, and at the end, I ask my question. If you know about alignment algorithm pass the beginning.
Consider we have two strings like:
ACCGAATCGA
ACCGGTATTAAC
There is some algorithms like: Smith-Waterman Or Needleman–Wunsch, that align this two sequence and create a matrix. take a look at the result in the following section:
Smith-Waterman Matrix
§ § A C C G A A T C G A
§ 0 0 0 0 0 0 0 0 0 0 0
A 0 4 0 0 0 4 4 0 0 0 4
C 0 0 13 9 4 0 4 3 9 4 0
C 0 0 9 22 17 12 7 3 12 7 4
G 0 0 4 17 28 23 18 13 8 18 13
G 0 0 0 12 23 28 23 18 13 14 18
T 0 0 0 7 18 23 28 28 23 18 14
A 0 4 0 2 13 22 27 28 28 23 22
T 0 0 3 0 8 17 22 32 27 26 23
T 0 0 0 2 3 12 17 27 31 26 26
A 0 4 0 0 2 7 16 22 27 31 30
A 0 4 4 0 0 6 11 17 22 27 35
C 0 0 13 13 8 3 6 12 26 22 30
Optimal Alignments
A C C G A - A T C G A
A C C G G A A T T A A
Question:
My question is simple, but maybe the answer is not easy as it looks. I want to use a group of character as a single one like: [A0][C0][A1][B1]
. But in these algorithms, we have to use individual characters. How can we achieve that?
P.S. Consider we have this sequence: #read #write #add #write
. Then I convert this to something like that: #read to A .... #write to B.... #add to C. Then my sequence become to: ABCB
. But I have a lot of different words that start with #
. And the ASCII table is not enough to convert all of them. Then I need more characters. the only way is to use something like [A0] ... [Z9]
for each word. OR to use numbers.
P.S: some sample code for Smith-Waterman is exist in this link
P.S: there is another post that want something like that, but what I want is different. In this question, we have a group of character that begins with a [
and ends with ]
. And no need to use semantic like ee
is equal to i
.