Why even use regular expressions? This looks like a JSON object/Python dict, you could just iterate through it and use str.endswith
.
>>> sources = {
... "sources": [
... {"file": "https:\/\/www588.playercdn.net\/85\/1\/e_q8OBtv52BRyClYa_w0kw\/1496784287\/170512\/359E33j28Jo0ovY.mp4",
... "label": "Standard (288p)","res":"288"},
... {"file": "https:\/\/www726.playercdn.net\/86\/1\/q64Rsb8lG_CnxQAX6EZ2Sw\/1496784287\/170512\/371lbWrqzST1OOf.mp4",
... "label": "Standard (288p)","res":"288"}
... ]
... }
>>> for item in sources['sources']:
... if item['file'].endswith('.mp4'):
... print(item['file'])
...
https:\/\/www588.playercdn.net\/85\/1\/e_q8OBtv52BRyClYa_w0kw\/1496784287\/170512\/359E33j28Jo0ovY.mp4
https:\/\/www726.playercdn.net\/86\/1\/q64Rsb8lG_CnxQAX6EZ2Sw\/1496784287\/170512\/371lbWrqzST1OOf.mp4
EDIT:
It looks like that link is available in a video
tag after the javascript has loaded. You could use a headless browser but I just used selenium
to fully load the page and then save the html.
After you have the full page html, you can parse it using BeautifulSoup
instead of regular expressions.
Using regular expressions to parse HTML: why not?
from bs4 import BeautifulSoup
from selenium import webdriver
def extract_mp4_link(page_html):
soup = BeautifulSoup(page_html, 'lxml')
return soup.find('video')['src']
def get_page_html(url):
driver = webdriver.Chrome()
driver.get(url)
page_source = driver.page_source
driver.close()
return page_source
if __name__ == '__main__':
page_url = 'https://www.rapidvideo.com/e/FFIMB47EWD'
page_html = get_page_html(page_url)
print(extract_mp4_link(page_html))