-4

So i have a .txt file storing a genome (it is a long sequence combination of repeats with A,C,G,T) e.g- TCGTGTTGAGAGGTATGAGACCTCTGGCAAGTACTTTGCCTACAAGATGGAGGAGAA....(it contains millions of these repeating characters stored in separate file) now i wanted to write a code to find the number of "ACGT" sequence motif in the complete Genome. Please can someone help with this.

mattm
  • 5,851
  • 11
  • 47
  • 77

1 Answers1

0

This is a simplification of the sequence alignment problem. There are multiple alignment tools already in existence that perform this kind of function, using data structures designed to reduce the time required to search for multiple sequences. If you want to run this kind of search for more than one query string, you should use one of these tools rather than performing a linear search in java.

mattm
  • 5,851
  • 11
  • 47
  • 77