java regex pattern to return different group in one text

Question

I am trying to apply a java regex to the following text to extract the content but the problem is that when there is only one href in the text it find the content fine, but when there is more, then it goes to the end of the text. here is the regex pattern:

Pattern pattern = Pattern.compile("\\\"\\>(.*)\\</a\\>\\<br\\>", Pattern.DOTALL);

here is the text :

<div><b>Attachments:</b> <a href="http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG</a><br><a href="http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif</a><br><a href=""></a></div>

so if there is only the href for 1.JPG then it find the right answer:

http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG

but when I add the yinYang.gif then if find the following :

">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/1.JPG</a><br><a href="http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif">http://projectspace.intranet.group/sites/CFY366N/Lists/Deliverables/Attachments/8/yinYang.gif</a><br>

How can I change this to find all the values between <a> ...</a> in different groups.

collapsar · Accepted Answer · 2014-04-29T13:13:22.437

1

Change your pattern into a non-greedy one:

"\\\"\\>(.*?)\\</a\\>\\<br\\>"

However, six words of warning are appropriate: don't do it this way.

you are essentially trying to parse (semi-)structured information using regular expression. experience tells, you are doomed if you follow this route. either regexen will prove not to be powerful enough to solve your problem in the end (think of nested structures) or you will produce unmaintainable code. probably both.

edited Apr 29 '14 at 13:13

answered Apr 29 '14 at 13:07

collapsar

17,010
4
35
61

...or even better use HTML parser. – Pshemo Apr 29 '14 at 13:09
@Pshemo you are perfectly right, i was already in the process of editing the solution to contain the appropriate warning. – collapsar Apr 29 '14 at 13:11

java regex pattern to return different group in one text

1 Answers1