Hi guys I am trying to read a genomic sequence and search for any 10 character repeats that appear. The solution that I have in mind is broken down into three steps:
- Read the Genomic sequence ex: GAAAAATTTTCCCCCACCCTTTTCCCC
- Cut the String into successive sequences of ten, for example the first newly generated string would be index 0-9 and the next would be 1-10,2-11,3-12...
- Store these sequences in an ArrayList
- Compare the strings
- Return repeated sequences and how often they repeat.
The trouble I am having is how to generate a new string from the older and larger string. Say if my genomic sequence is AAAAGGGGGAAAATTTCCCC then my first ten character sequence would be AAAAGGGGGA and the next would be AAAGGGGGAA. How would I go about doing that in java?
This is what I have so far:
import java.util.List;
import java.util.ArrayList;
public class Solution
{
public ArrayList<String> findRepeatedDnaSequences(String s)
{
ArrayList<String> sequence = new ArrayList<String>();
int matches;
ArrayList<String> matchedSequence = new ArrayList<String>();
for(int i = 0; i < s.length(); i++)
{
if (i + 9 > s.length())
{
sequence.add(s.substring(i, i + 9));
}
}
for(int i = 0; i < sequence.size(); i++)
{
matches = 0;
for (int j = 1; j < sequence.size(); j++)
{
if(sequence.get(i) == sequence.get(i))
{
matches++;
System.out.print(matches);
matchedSequence.add(sequence.get(i));
}
}
}
return matchedSequence;
}
}