I would like to retrieve the video ID part of a YouTube URL which is part of a HTML anchor element like so using regex:
<a href="http://www.youtube.com/watch?v=NC2blnl0WTE">Some text</a>
I have looked around for some solutions. I found one from a Javascript solution which took the video ID from the url like so:
/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig
I would like to use this in Python as it supports every variance of YouTube's URLs. I implemented it in my Python script:
string = re.sub(r'https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:[\'"][^<>]*>|<\/a>))[?=&+%\w.-]*', r'\1', string)
And I get no replacements. I removed the /
and /ig
from the regex as they are only in Javascript however I still can't get it to pick up the video ID. Once I am able to pick up the ID, I can easily change around the regex to remove the anchor element.
What have I done wrong with my solution? Thanks.