I would like to find the number of instances of "$$$$" pattern in a text file. Following method works with some files, but not with all files. For example, it does not work with the following file (http://www.hmdb.ca/downloads/structures.zip - it is a zipped text file with .sdf extension) I can't figure out why? I also tried to escape whitespaces. No luck. It returns 11 when there is more than 35000 "$$$$" patterns. Please note, the speed is crucial. Therefore, I can't use any slower methods.
public static void countMoleculesInSDF(String fileName)
{
int tot = 0;
Scanner scan = null;
Pattern pat = Pattern.compile("\\$\\$\\$\\$");
try {
File file = new File(fileName);
scan = new Scanner(file);
long start = System.nanoTime();
while (scan.findWithinHorizon(pat, 0) != null) {
tot++;
}
long dur = (System.nanoTime() - start) / 1000000;
System.out.println("Results found: " + tot + " in " + dur + " msecs");
} catch (Exception e) {
e.printStackTrace();
} finally {
scan.close();
}
}