Here's a Perl one-liner:
perl -ne 'while (m/(.)(\1*)/g) { printf "%5i %s\n", length($2)+1, $1 }' <<<AATGATGGAANNNNNGATAGAACGATNNNNNNNNGATAATGANNNNNNNTAGACTGA
2 A
1 T
1 G
1 A
1 T
2 G
2 A
5 N
1 G
1 A
1 T
1 A
1 G
2 A
1 C
1 G
1 A
1 T
8 N
1 G
1 A
1 T
2 A
1 T
1 G
1 A
7 N
1 T
1 A
1 G
1 A
1 C
1 T
1 G
1 A
The m/(.)(\1*)/
successively matches as many identical characters as possible, with the /g
causing the matching to pick up again on the next iteration for as long as the string still contains something which we have not yet matched. So we are looping over the string in chunks of identical characters, and on each iteration, printing the first character as well as the length of the entire matched string.
The first pair of parentheses capture a character at the beginning of the (remaining unmatched) line, and \1
says to repeat this character. The *
quantifier matches this as many times as possible.
If you are interested in just the N:s, you could change the first parenthesis to (N)
, or you could add a conditional like printf("%7i %s\n", length($2), $1) if ($1 == "N")
. Similarly, if you want only hits where there are repeats (more than one occurrence), you can say \1+
instead of \1*
or add a conditional like ... if length($2) >= 1
.