0

Regex to get all the content after the first closing tag ">" encountered before

"<a href " till the end. 

How do I get that? I'm not good at regex :/

eg:

<img class="abc" src="abc.jpg"> blah blah blah&nbsp;<a 
href="http://en.wikipedia.org/wiki">abc defg hijk lmnop</a>&nbsp; blah

Expected output:

blah blah blah abc defg hijk lmnop blah

user3481808
  • 31
  • 1
  • 4

2 Answers2

0

Try this one:

htmls = htmls.replaceAll(".*?>(?=.*?<a href)", "");

It means remove everything until the closing tag, which is before the first <a href

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
0

Long story short, you cannot parse HTML with a Regex because HTML is not a regular language. See here for a full discussion.

Community
  • 1
  • 1