1

I am trying to retrieve part of the src from different iframes from an HTML input.

So far, I've tried different methods but none of them works for all iframes. What I've tried so far:

<iframe(.*?)><\/iframe>
<iframe src="(.+?)".+</iframe>
<iframe.+?src=[\"'](.+?)[\"'].*?>

And here is a sample of iframe tags that I have:

<iframe src="http://www.youtube.com/embed/NM51qOpwcIM?modestbranding=1;rel=0;showinfo=0;autoplay=0;autohide=1;yt:stretch=16:9;wmode=transparent;?wmode=transparent" allowfullscreen="" style="width: 640px; height: 361.057px;" frameborder="0"></iframe>

<iframe src="https://www.youtube.com/embed/VASywEuqFd8?feature=oembed" allowfullscreen="" width="660" height="371" frameborder="0"></iframe>

Ideally, I would like to retrieve the src from the beginning and just before the first question mark (?) as such:

http://www.youtube.com/embed/NM51qOpwcIM
Zoe
  • 27,060
  • 21
  • 118
  • 148
user2093301
  • 367
  • 1
  • 6
  • 17
  • [You shouldn't try to parse `/X?HTML/` with regexes.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Sebastian Lenartowicz Nov 02 '16 at 02:42

1 Answers1

9

This can be achieved using

(?<=src=").*?(?=[\?"])

See working example on Regex101

Explanation

  1. (?<=src=") Prepended by src="
  2. .*? Lazy match any token
  3. (?=[\?"]) Until either a ? or " would be the next token

If you might have a longer URL that doesn't end with ?

(?<=src=").*?(?=[\*"])
Zoe
  • 27,060
  • 21
  • 118
  • 148
nozzleman
  • 9,529
  • 4
  • 37
  • 58