0

I have a unique problem statement where I have to perform regex on an input string using triple characters. e.g. if my input is ABCDEFGHI, a pattern search for BCD should return false since I am treating my input as ABC+DEF+GHI and need to compare my regex pattern with these triple characters.

Similarly, regex pattern DEF will return true since it matches one of the triplets. Using this problem statement, assume that my input is QWEABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZFGH and I am trying to get all output strings that start with triplet ABC and end with XYZ. So, in above input, my outputs should be two strings: ABCPOIUYTREWXYZ and ABCMNBVCXZASXYZ.

Also, I have to store these strings in an ArrayList. Below is my function:

public static void newFindMatches (String text, String startRegex, String endRegex, List<String> output) {
    int startPos = 0;
    int endPos = 0;
    int i = 0;
    // Making sure that substrings are always valid
    while ( i < text.length()-2) {
        // Substring for comparing triplets
        String subText = text.substring(i, i+3);
        Pattern startP = Pattern.compile(startRegex);
        Pattern endP = Pattern.compile(endRegex);
        Matcher startM = startP.matcher(subText);
        if (startM.find()) {
            // If a match is found, set the start position
            startPos = i;
            for (int j = i; j < text.length()-2; j+=3) {
                String subText2 = text.substring(j, j+3);
                Matcher endM = endP.matcher(subText2);
                if (endM.find()) {
                    // If match for end pattern is found, set the end position
                    endPos = j+3;
                    // Add the string between start and end positions to ArrayList
                    output.add(text.substring(startPos, endPos));
                    i = j;
                }
            }               
        }
        i = i+3;

    }


}

Upon running this function in main as follows:

String input = "QWEABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZFGH";
    String start = "ABC";
    String end = "XYZ";
    List<String> results = new ArrayList<String> ();
    newFindMatches(input, start, end, results);

    for (int x = 0; x < results.size(); x++) {
        System.out.println("Output String number "+(x+1)+" is: "+results.get(x));
    }

I get the following output:

Output String number 1 is: ABCPOIUYTREWXYZ
Output String number 2 is: ABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZ

Notice that first string is correct. However, for the second string, program is again reading from start of input string. Instead, i want the program to read after the last end pattern (i.e. skip the first search and unwanted characters such as ASDFGHJKL and should only print 2nd string as: ABCMNBVCXZASXYZ

Thanks for your responses

Pshemo
  • 122,468
  • 25
  • 185
  • 269
KrnK
  • 111
  • 4
  • 15
  • Strange. I would add some debug output statements to see what some variables (`i`, `j`, `startPos`) are doing at various points in the code. At first glance it looks all correct. – Floris Oct 17 '13 at 02:45
  • I would tackle this problem differently. [Split input](http://stackoverflow.com/a/2298477/664577) look for first matching start triple, see it there is a matching ending triple, register, discard and so on. – Anthony Accioly Oct 17 '13 at 03:01

1 Answers1

2

The problem here is that when you find your end match (the if statement within the for loop), you don't stop the for loop. So it just keeps looking for more end-matches until it hits the for-loop end condition j < text.length()-2. When you find your match and process it, you should end the loop using "break;". Place "break;" after the i=j line.

Note that technically the second answer your current program gave you is correct, that is also a substring that begins with ABC and ends with XYZ. You might want to rethink the correct output for your program. You could accommodate that situation by not setting i=j when you find a match, so that the only incrementing of i is the i=i+3 line, iterating across the triplets (and not adding the break).

Seth Nelson
  • 2,598
  • 4
  • 22
  • 29
  • Yes, this did work. Thanks for your prompt response and explanation. Lesson learned. – KrnK Oct 17 '13 at 03:00