-2

Hi guys I am trying to read a genomic sequence and search for any 10 character repeats that appear. The solution that I have in mind is broken down into three steps:

  1. Read the Genomic sequence ex: GAAAAATTTTCCCCCACCCTTTTCCCC
  2. Cut the String into successive sequences of ten, for example the first newly generated string would be index 0-9 and the next would be 1-10,2-11,3-12...
  3. Store these sequences in an ArrayList
  4. Compare the strings
  5. Return repeated sequences and how often they repeat.

The trouble I am having is how to generate a new string from the older and larger string. Say if my genomic sequence is AAAAGGGGGAAAATTTCCCC then my first ten character sequence would be AAAAGGGGGA and the next would be AAAGGGGGAA. How would I go about doing that in java?

This is what I have so far:

import java.util.List;
import java.util.ArrayList;

public class Solution
{
    public ArrayList<String> findRepeatedDnaSequences(String s) 
    {
        ArrayList<String> sequence = new ArrayList<String>();
        int matches;
        ArrayList<String> matchedSequence = new ArrayList<String>();
        for(int i = 0; i < s.length(); i++)
        {
            if (i + 9 > s.length())
            {
                sequence.add(s.substring(i, i + 9));
            }

        }
        for(int i = 0; i < sequence.size(); i++)
        {
            matches = 0;
            for (int j = 1; j < sequence.size(); j++)
            {
                if(sequence.get(i) == sequence.get(i))
                {
                    matches++;
                    System.out.print(matches);
                    matchedSequence.add(sequence.get(i));
                }
            }
        }
        return matchedSequence;
    }
}
Helder Pereira
  • 5,522
  • 2
  • 35
  • 52
Justin Reid
  • 119
  • 1
  • 9

3 Answers3

0
public class MainClass {

    public static void main(String[] args){
        printAllSequences("GAAAAATTTTCCCCCACCCTTTTCCCC", 10);
    }

    public static void printAllSequences(String DNASequence, int subSequenceSize){
        for(int i=0; i<DNASequence.length() - subSequenceSize - 1; i++){
            System.out.println(DNASequence.substring(i, i + subSequenceSize));
        }
    }

}
Saba Jamalian
  • 750
  • 2
  • 10
  • 24
  • 1
    This should do it. But since the genome can be huge, you could use a cyclic array if you're looking into more efficient solution. – Bifz Feb 15 '16 at 23:01
0

If you are using Java 8, you can do it with streams. Unfortunately, there are many methods missing in the Stream API that exist in other programming languages, but we can still implement them ourselves. So using the method sliding from this answer:

How can I convert a Stream of Strings to Stream of String pairs?

You can do something like this:

String gseq = "AAAAACCCCCAAAAACCCCC";

Map<String, Long> count = StreamUtils.sliding(10, gseq.chars().boxed())
        .map(l -> new String(l.stream().mapToInt(n -> n).toArray(), 0, l.size()))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

This will produce a map with the counting for each substring of length 10.

Community
  • 1
  • 1
Helder Pereira
  • 5,522
  • 2
  • 35
  • 52
0
Following is the complete class that you are looking for. The code is pretty self explanatory.
package source;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.zip.InflaterInputStream;

public class PatternFinding {

    //function to find the patterns
    public static List<String> stringMatcher(String str,int len){
        String string="";
        int count=1;
        List<String> list=new ArrayList<String>();
        for(int i=0;i+len<=str.length();i++){
            System.out.print(i);
            string="";
            count=1;
            char ch=str.charAt(i);
            string+=String.valueOf(ch);
            for(int j=i+1;j<str.length() && j<i+len;j++){
                System.out.println(" "+j);
                if(ch==str.charAt(j)){
                    count++;
                    string+=String.valueOf(str.charAt(j));
                }else{
                    break;
                }
            }
            System.out.println(string);
            if(count==len){
                list.add(string);
            }
        }
        return list;
    }

    public static void main(String[] args) throws IOException {
        BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
        String text=br.readLine();
        //pass the length of your pattern as second arguement
        List<String> list=stringMatcher(text,5);

        //sorting the list
        Collections.sort(list);
        for(int i=0;i<list.size();i++){
            System.out.println(list.get(i));
        }

        //counting occurances
        for(int i=0;i<list.size();){
            String str=list.get(i);
            int lastIndex=list.lastIndexOf(str);
            System.out.println(str+" happens "+ (lastIndex-i+1)+" times");
            i=lastIndex+1;
        }

    }
}