0

I am trying to match a group of words between two words in a String. I will be using Java RegEx.

Input Text

The clever fox JUMPED OVER the big dog and ran away.

Expected Output

the big

RegEx Used

(?<=(fox\s[A-Z0-9]*))(?s)(.*?)(?=\sdog)

I get below output which gives me all words between fox and dog

JUMPED OVER the big

The word "fox" will be followed by one or more all upper case words always. I need to match all the words following these two words till I get "dog".

Also I need to get the desired output in Capture Group 0. I can not use different capture groups. This is a limitation in my application.

Any help on this is greatly appreciated.

immzi
  • 121
  • 1
  • 1
  • 11
  • "a group of words between two words". What are these two words? –  Apr 24 '14 at 09:03
  • @Tichodromamuraria in the example string, I need all words between the words "fox JUMPED OVER" and "dog". Please pardon my english, these are more than 2 words. – immzi Apr 24 '14 at 09:06

3 Answers3

1

I'm afraid java doesn't support variable length look behind assertions.

In addition capture group 0 is the full text which due to variable length lookbehinds not being allowed (as explained before) is impossible unless you know there is always going to be a certain length of uppercase words.

To do this with capture group 1 try:

(?<=fox)(?:\s[A-Z0-9]*)*\s?(.*?)(?=\sdog)

EDIT: Fixed typo in regex

EDIT 2: clarified fulltext problem.

EDIT 3: Depending on how stupid java is with "non-obvious maximum length of lookbehind group" this might work: (?<=fox(?:\s[A-Z0-9]{5,7}){1,2})(.*?)(?=\sdog) but I need to ask, what makes you so sure that you need this to be capture group 0? I somewhat doubt that's the case, even if it does you can just take the output and then run it again against .* to get a regex of capture group 0, there's no way you really need this as a requirement.

Community
  • 1
  • 1
Mike H-R
  • 7,726
  • 5
  • 43
  • 65
  • Yes, I was able to get the desired text in capture group 1. But I need it as full text i.e., group 0. Also tried to use your regex, it is not giving me the desired text in capture group 1. Not sure if I am missing anything here. Not very knowledgeable in RegExes. Thanks for warning about the variable length look behind. – immzi Apr 24 '14 at 09:20
  • It is impossible to get the fulltext to be in capture group 0 as I said, due to the variable length lookbehind not being allowed, sorry I had a typo in the regex, fixed now. – Mike H-R Apr 24 '14 at 09:24
  • I think I can have the minimum and maximum characters that the upper case word will have. Can that help. Lets say for example my upper case word will have minimum 5 and max 7 characters – immzi Apr 24 '14 at 09:43
  • and is there a set number of words there can be? – Mike H-R Apr 24 '14 at 09:44
  • Tried with your regex in EDIT3. It gives me result as **D OVER the big** – immzi Apr 24 '14 at 10:47
  • when I try it I get an error about `Look-behind group does not have an obvious maximum length near` did you read the rest of it? there's no way that there is a strong requirement for you to have a capture group of 0 – Mike H-R Apr 24 '14 at 11:00
  • Are you sure it's java regexes? because you should be getting the error, if you don't get an error, try: `(?<=fox(?:\s[A-Z0-9]*)*\s?)(.*?)(?=\sdog)` as this will work if you're allowed variable spaced lookbehinds. – Mike H-R Apr 24 '14 at 11:05
  • Yes, its Java RegEx. Am using one tool called Expresso for testing the RegExes. It gave no error for the regex in EDIT3. As for not using capture 1, I know this could be easily achieved in Java code. But my application is not in Java. I need to feed this regex into other application which uses java. – immzi Apr 24 '14 at 11:09
  • The EDIT3 regex works as per my need if I change the word repetitions to exact 2. But that may not be the case, I can have 1 or 2. – immzi Apr 24 '14 at 11:11
1

You can use this regex:

^.*fox[A-Z0-9\s]*(.*)dog.*$

You can pass fox and dog by parameter in your function to use in other cases.

Andynedine
  • 441
  • 5
  • 20
  • This gives me the matched words in group 1. Is there a way I can get it in group 0. Many thanks for your answer – immzi Apr 24 '14 at 09:34
  • You could use the group-ignore syntax by changing `(.*)` to `(?=.*)`. This lets the matcher ignore the group. – maxdev Apr 24 '14 at 09:35
  • @maxdev after doing the change that you told I dont get any match. The RegEx looked like this ^.*fox[A-Z0-9\s]*(?=.*)dog.*$ – immzi Apr 24 '14 at 09:41
  • @immzi I only get 1 match, insn´t it?? group 0="the big". What value you get in group 0?? – Andynedine Apr 24 '14 at 09:55
  • @Andynedine group 1 is "the big" group 0 is "The clever fox JUMPED OVER the big dog and ran away." – Mike H-R Apr 24 '14 at 10:10
  • you're right. Well @immzi ... I don't know how you can ignore that macht 0. Can you show all your code to help you?? Perhaps we can find some alternate solution – Andynedine Apr 24 '14 at 10:24
  • @Andynedine There is no java code for this. I have to give this RegEx in a tool which does further processing. So I need the RegEx itself to give me the matching words in group 0 – immzi Apr 24 '14 at 10:45
  • @immzi I downloaded Expresso app from http://www.ultrapico.com/expresso.htm, and y I used the regex. I think if you fing a match, always it'll show you your initial string, but y you use "REPLACE BUTTON", you'll get "THE BIG" really.... You used Expresso, didn't you? – Andynedine Apr 24 '14 at 11:23
  • Yes, I use Expresso. But that is only to test the regexes. Is there a way to specify the replacement string in the regular expression itself. That will also solve my problem – immzi Apr 24 '14 at 11:36
  • If you use Java: str = str.replaceAll(".*fox[A-Z0-9\s]*(.*)dog.*", "$1"); I think it'll work... OUTPUT: str="the big" – Andynedine Apr 24 '14 at 11:48
0

Without regexp:

    String fox = "The clever fox JUMPED OVER the big dog and ran away.";

    boolean start = false;
    for (String word : fox.split("\\s")) {
        if ("fox".equals(word)) {
            start = true;
            continue;
        }
        if ("dog".equals(word)) {
            break;
        }
        if (start) {
            System.out.println(word);
        }
    }
  • I need it as a regular expression. I need to feed this into another application which just accepts the exact words that need to be matched. Can you suggest me something using RegEx, please. – immzi Apr 24 '14 at 09:23