Newest Update: This seems to be a problem with the matcher, not the expression itself. I tested it a little more and using the Pattern/Matcher on an input string causes the problem. The input string having meta characters causes the matcher to skip over a match. If I just use a simple .replaceAll with the same expression it finds it just fine. I tried to use Pattern.quote on the input string but didn't change anything. So I'm still stuck. Why does the matcher not find a match if meta characters in the input string exist? And is there a way to make the input string ignore meta characters in regards to the matcher?
I am trying to do a regex on a large string to pull out all html links from the start of the tag to the closing tag. I came up with this expression:
<a.*?</a>
Which does a pretty good job. It gets almost all of them. My problem is if there is parenthesis inside the string like:
<a href="blahblah">myproblem()</a>
The matcher completely skips this link. I thought that the .*? would pick up everything from the space after the first a to the open bracket of the closing a tag but it doesn't if there are any parenthesis.
What am I missing here?
EDIT for clarification:
I am using java. Here is what I am doing for testing this before adding to my project. When I run this it fails, but if I take out the () on test, it passes. With the () I'm pretty sure it isn't even being added to the list:
String tryConvert = doclet.htmlToWiki("<a href=\"#test.method\">test()</a>");
assertThat(tryConvert, is("[test()|test#method]"));
And the htmlToWiki code:
ArrayList<String> links = new ArrayList<String>();
Pattern linkPattern = Pattern.compile("<a.*?</a>", Pattern.DOTALL);
Matcher matcher = linkPattern.matcher(html);
while (matcher.find())
{
links.add(matcher.group());
}
for (String link : links)
{
String original = link;
String alias = link.replaceAll("<a.*?>", "");
alias = alias.replaceAll("</a>", "");
link = link.replaceAll("\">.*?</a>", "]");
link = link.replaceAll("<a.*#", "[");
link = link.replaceAll("\\.", "#");
link = link.replace("[", "[" + alias + "|");
html = html.replaceAll(original, link);
}