Java Regexp for matching all the content between "<" and ">" in a paragraph

Asked Sep 26 '16 at 19:54

Active Sep 26 '16 at 19:54

Viewed 16 times

I have the following paragraph:

<p class="cms-body-text" style="box-sizing: border-box; margin: 0px  0px 10px; line-height: 25.6px; background-color: #ffffff;">This is a test. This is another test. This is a third test<a href="javascript:%20window.print();">Print / Save as PDF</a></p><h3 class="cms-h3">TEST</h3>

I need a Java regexp that matches all the content between < and > only. This is what I've tried:

<[aA-zZ]+\s[aA-zZ]+="(.+)\>

This is not working because it is matching the text which is not surrounded by the < and >characters as well.

Any help?

asked Sep 26 '16 at 19:54

Topa_14

I guess you meant `[a-zA-Z]`, not `[aA-zZ]`. To match as few as possible, use [lazy quantifiers](https://regex101.com/r/zN7kE5/1). And if you need a safer tool when parsing HTML, use the dedicated libraries. – Wiktor Stribiżew Sep 26 '16 at 19:59
Do you really _need_ a regex? When dealing with XML/html tags I'd encourage you to use a proper API like jsoup for example. – Alexander Sep 26 '16 at 20:04

Java Regexp for matching all the content between "<" and ">" in a paragraph

0 Answers0