How to match the closest pattern on a capture group excluding overlap?

Asked Jun 04 '20 at 16:57

Active Jun 04 '20 at 17:07

Viewed 22 times

I want to use regex to match text, and I have a pattern like /<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>[\s\S]*?(text)[\s\S]*?<\/a>/gi, how can I grab only the inner <a ...> text </a> when I am given <a ...> <a ...> ... text ... </a> </a> ? the ... means it could be anything.

I know using regex on html probably is not the best idea, but using a DOM parser is pretty slow.

Edit 1: Example input: <a href="cbc.com"><a href="google.com"> text </a></a>, Desired output: <a href="google.com"> text </a>, currently using /<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>[\s\S]*?(text)[\s\S]*?<\/a>/gi will match <a href="cbc.com"><a href="google.com"> text </a>

edited Jun 04 '20 at 17:07

asked Jun 04 '20 at 16:57

Blkc

1

not clear, give possible input and ouput. – The Scientific Method Jun 04 '20 at 17:00
thanks, edited and provided input and output – Blkc Jun 04 '20 at 17:09
`foo(?:(?!foo).)*?xyz.*?bar` is the formula – Wiktor Stribiżew Jun 04 '20 at 17:57

How to match the closest pattern on a capture group excluding overlap?

0 Answers0