0

I have this regex

href=["'](.*?)["']

And I want to match this entire string but it only matches up to (' and does not include explore

href="javascript:openurl('/Xplore/accessinfo.jsp')"

It also has to match

href="/iel5/4235/4079606/04079617.pdf?tp=&arnumber=4079617&isnumber=4079606"

The first link is the only special case, I have been able to match all other cases with the regex I have provided, I just want to somehow exclude the ' in the middle of the first string.

Markimoop
  • 15
  • 4
  • 1
    Your regexp tries to match as _few_ characters as possible with the middle parentheses. So it stops matching on the first quote/double quote it finds after the initial one. – einpoklum May 27 '20 at 07:42

1 Answers1

1

What you could do is have a positive lookahead define the end of the string:

^href=("|').*?(?=\1)\1$

That way, no matter if its a single or double quote, the second capture group will run till it finds the same single or double quote.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Thank you, could you please explain what the `(?=\1)` means? – Markimoop May 27 '20 at 07:45
  • It's a positive lookahead to match either the double or single quote from capture group 1. Btw, since you want to match the whole string I removed the capture group in the middle and included start and end string ancors @Markimoop – JvdV May 27 '20 at 07:54
  • So, since it captured a double quote first it will try to match with that? – Markimoop May 27 '20 at 07:59
  • 1
    Exactly =) @Markimoop – JvdV May 27 '20 at 08:06