Need help in parsing html string
String str = "<div id=\"test\" ><a href=\"#aaaa\"> Amrit </a> </div><div><a href=\"#bbbb\" > Amrit </a> </div><a href=\"#cccc\" ><a href=\"#dddd\" >";
String reg = ".*(<\\s*a\\s+href\\s*=\\s*\\\"(.+?)\"\\s*>).*";
str is my sample string and reg is my regex used to parse all the anchor tags, specially the value of href. Using this regex, it only shows the last part of the string.
Pattern MY_PATTERN = Pattern.compile(reg);
Matcher m = MY_PATTERN.matcher(str);
while (m.find()) {
for(int i=0; i<m.groupCount(); i++){
String s = m.group(i);
System.out.println("->" + s);
}
}
This is the code I did. What is missing?
And also if i want particular occurrence of string to be replaced, generally if I have my url changed form [string]_[string] into [string]-[string]. How can I get "_" and replace it by "-" ?