I want to use regex to match text
, and I have a pattern like /<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>[\s\S]*?(text)[\s\S]*?<\/a>/gi
, how can I grab only the inner <a ...> text </a>
when I am given <a ...> <a ...> ... text ... </a> </a>
? the ...
means it could be anything.
I know using regex on html probably is not the best idea, but using a DOM parser is pretty slow.
Edit 1: Example input: <a href="cbc.com"><a href="google.com"> <span>text</span> </a></a>
, Desired output: <a href="google.com"> <span>text</span> </a>
, currently using /<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>[\s\S]*?(text)[\s\S]*?<\/a>/gi
will match <a href="cbc.com"><a href="google.com"> <span>text</span> </a>