-1

I know that there have been variations of questions answered here

I have tried to go through the solutions and come up with a regular expression for my needs. I have a string of text over multiple lines with neither a fixed starting location nor an ending location for a particular line.

<a name='bill_pay' href='javascript:goto(&#39;billpay&#39;);' class='fsdnav-top-menu-item'>Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.

To move through submenu items press tab and then press up or down arrow.</span> </a>
<a name='bill_pay' href='javascript:goto(&#39;findmyinfo&#39;);' class='fsdnav-top-menu-item'>
Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.

To move through submenu items press tab and then press up or down arrow.</span> </a>
<a name='bill_pay' href='#' onClick='OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;);return false;' class='fsdnav-top-menu-item'>
Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.
To move through submenu items press tab and then press up or down arrow.</span> </a>

I would like to extract the following the contents from javascript:goto(&quot;link&quot;) (what ever link value represents) There are multiple such occurrences in the above regex, but the regex that I am using returns just a single occurrence. I would like to return all of it. My code block is given below

private static final Pattern PATTERN_WITH_ASCII_QUOTES =
    Pattern.compile("^.*goto\\(&#39;(\\w+)&#39;\\).*",
        Pattern.MULTILINE|Pattern.DOTALL);

// "str" is the string representation of the text above.
Matcher m = PATTERN_WITH_ASCII_QUOTES.matcher(str);
while (m.find()) {
    System.out.println(m.group(1));
}

The resultant output is always findmyinfo and nothing else.

UPDATE - The desired outputs are

 billpay (from javascript:goto(&#39;billpay&#39;);)
 findmyinfo (from javascript:goto(&#39;findmyinfo&#39;);)

I would also like to to extract

/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage from OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;)
halfer
  • 19,824
  • 17
  • 99
  • 186
Kartik
  • 2,541
  • 2
  • 37
  • 59

3 Answers3

1

You are always taking the group(1) that is the probem. Use

while (m.find()) {
    System.out.println(m.group());
}
  • No text is printed. The first entry is the entire string and then nothing. I don't get the extracted strings. – Kartik Sep 07 '14 at 07:45
1

You need to add OLLPopUp and goto into a non-capturing group in-order to get the first, second and third values.

 ^.*?(?:goto|OOLPopUp)\(&#39;(.*?)&#39;\).*

DEMO

String s = "<a name='bill_pay' href='javascript:goto(&#39;billpay&#39;);' class='fsdnav-top-menu-item'>Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.\n" + 
        "To move through submenu items press tab and then press up or down arrow.</span> </a>\n" +
        "<a name='bill_pay' href='javascript:goto(&#39;findmyinfo&#39;);' class='fsdnav-top-menu-item'>\n" +
        "<a name='bill_pay' href='#' onClick='OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;);return false;' class='fsdnav-top-menu-item'>\n" +
        "Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.";
Pattern regex = Pattern.compile("^.*?(?:goto|OOLPopUp)\\(&#39;(.*?)&#39;\\).*", Pattern.MULTILINE);
 Matcher matcher = regex.matcher(s);
 while(matcher.find()){
        System.out.println(matcher.group(1));
}

Output:

billpay
findmyinfo
/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage

OR

String s = "<a name='bill_pay' href='javascript:goto(&#39;billpay&#39;);' class='fsdnav-top-menu-item'>Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.\n" + 
        "To move through submenu items press tab and then press up or down arrow.</span> </a>\n" +
        "<a name='bill_pay' href='javascript:goto(&#39;findmyinfo&#39;);' class='fsdnav-top-menu-item'>\n" +
        "<a name='bill_pay' href='#' onClick='OOLPopUp(&#39;/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;);return false;' class='fsdnav-top-menu-item'>\n" +
        "Bill Pay <span class='fsdnav-ada-hidden'>link and menu. Press enter to navigate to this link. Press control + space to open submenu.";
Pattern regex = Pattern.compile("^(?:.*?goto\\(&#39;(\\w+)&#39;\\).*|.*?OOLPopUp\\(&#39;(.+?&#39;\\)).*)$", Pattern.MULTILINE);
 Matcher matcher = regex.matcher(s);
 while(matcher.find()){
        System.out.println(matcher.group(1) != null ?
                matcher.group(1) : matcher.group(2)
                );
}

Output:

billpay
findmyinfo
/myaccounts/brain/redirect.go?target=findmyroutingnumber&#39;,&#39;ool&#39;,&#39;currentPage&#39;)

IDEONE

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • I have another clarification and I hope you don't mind. I have another set of urls such as Bill Pay, replacing ' with \'. I have tried to reverse engineer your regex, but nothing seems to be working. I get IndexOutOfBoundsException for every variation that I try. How can I add that as well? – Kartik Sep 07 '14 at 11:54
  • Yes. That is precisely the effect I want. Although, we use single quotes in our HTML document, so we have a set up like Bill Pay. I would like to be able to extract the value from here as well. – Kartik Sep 07 '14 at 12:38
  • I am not able to put that in regex though. Pattern.compile("^.*?(?:goto|OOLPopUp)\\('|'(.*?)'|'\\).*", Pattern.MULTILINE); returns an ArrayOutOfBoundsException. – Kartik Sep 08 '14 at 03:39
  • replace single backslash with double backslash – Avinash Raj Sep 08 '14 at 03:41
  • Where do I replace single backslash with doubleslash? Pattern.compile("^.*?(?:goto|OOLPopUp)\\('|\'(.*?)'|\'\\).*", Pattern.MULTILINE); throw ArrayOutOfBoundsException – Kartik Sep 08 '14 at 06:26
0

There is a problem with your pattern. Try this:

Pattern.compile("goto\\(&#39;(\\w+)&#39;\\)",
                    Pattern.MULTILINE|Pattern.DOTALL);

Also in printing the result, you can try :

System.out.println(m.group(1) + " ( from " + str.substring(m.toMatchResult().start(), m.toMatchResult().end()) + " )");

the output is like this:

billpay (from goto(&#39;billpay&#39;);)
findmyinfo (from goto(&#39;findmyinfo&#39;);)
pms
  • 944
  • 12
  • 28