Web Scraping Videos

Question

I'm attempting to do a proof of concept by downloading a TV episode of Bob's Burgers at https://www.watchcartoononline.com/bobs-burgers-season-9-episode-3-tweentrepreneurs.

I cannot figure out how to extract the video url from this website. I used Chrome and Firefox web developer tools to figure out it is in an iframe, but extracting src urls with BeautifulSoup searching for iframes, returns links that have nothing to do with the video. Where are the references to mp4 or flv files (which I see in Developer Tools - even though clicking them is forbidden).

Any understanding on how to do video web scraping with BeautifulSoup and requests would be appreciated.

Here is some code if needed. A lot of tutorials say to use 'a' tags, but I didn't receive any 'a' tags.

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.watchcartoononline.com/bobs-burgers-season-9-episode-5-live-and-let-fly")
soup = BeautifulSoup(r.content,'html.parser')
links = soup.find_all('iframe')
for link in links:
    print(link['src'])

Possible duplicate of [Is there a way to download a video from a webpage with python?](https://stackoverflow.com/questions/35842873/is-there-a-way-to-download-a-video-from-a-webpage-with-python) — Lucas Wieloch, Nov 07 '18 at 19:41

score 6 · Answer 1 · answered Nov 07 '18 at 20:53

import requests
url = "https://disk19.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e03.mp4?st=_EEVz36ktZOv7ZxlTaXZfg&e=1541637622"
def download_file(url,filename):
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                #f.flush() commented by recommendation from J.F.Sebastian       
    return filename

download_file(url,"bobs.burgers.s09e03.mp4")

This code will download this particular episode onto your computer. The video url is nested inside the <video> tag in the <source> tag.

This did save a file named after your function, but it was invalid and only 162bytes. Why didn't beautifulsoup find the video and source tags? I couldn't even located the url containing the extension mp4 with bs4 or by simply searching the requests response text/content. — user192085, Nov 08 '18 at 20:50

score 4 · Answer 2 · answered Apr 09 '21 at 23:50

Background Information

(scroll all the way down for your answer)

This is only easily obtainable if the website you're trying to get the video format from makes it explicitly stated in the HTML. If you want to, for example, get a .mp4 file from the site of your choice by referencing the .mp4 URL, then if we use this site here for instance; https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314 if we look for <video> in inspect element, there will be an src containing the .mp4

Now if we were to try to grab the .mp4 URL from this website like this

import requests
from bs4 import BeautifulSoup 


html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url) 
soup = BeautifulSoup(html_response.text, 'html.parser') 


for mp4 in soup.find_all('video'):
    mp4 = mp4['src']

print(mp4)

We would get a KeyError: 'src' output. This happens due to the actual video being stored in source which we can view if we print out the values inside soup.find_all('video')

import requests
from bs4 import BeautifulSoup 


html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url) 
soup = BeautifulSoup(html_response.text, 'html.parser') 


for mp4 in soup.find_all('video'):
    pass

print(mp4)

The output:

<video class="video-js vjs-default-skin vjs-big-play-centered" controls="" data-setup="{}" height="264" id="example_video_1" poster="" preload="none" width="640">
<source src="https://mountainoservo0002.animecdn.com/Yakunara-Mug-Cup-mo/Yakunara-Mug-Cup-mo-Episode-01.1-1080p.mp4" type="video/mp4"/>
</video>

So if we wanted to now download the .mp4, we would use the source element and get the src from that instead.

import requests
import shutil # - - This module helps to transfer information from 1 file to another 
from bs4 import BeautifulSoup # - - We could honestly do this without soup


# - - Get the url of the site you want to scrape
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url) 
soup = BeautifulSoup(html_response.text, 'html.parser') 

# - - Get the .mp4 url and the filename 
for vid in soup.find_all('source'):
    url = vid['src']
    filename = vid['src'].split('/')[-1]

# - - Get the video 
response = requests.get(url, stream=True)

# - - Make sure the status is OK
if response.status_code == 200:
    # - - Make sure the file size is not 0
    response.raw.decode_content = True

    with open(filename, 'wb') as f:
        # - - Copy what's in response.raw and transfer it into the file
        shutil.copyfileobj(response.raw, f)

(You could obviously simplify this by just copying the source's src manually and using that as the base URL without having to use html_url I just wanted to show you that you could choose to reference the .mp4 (aka the source's src))

Once again, not every site is this clear-cut. For this site in particular, we're fortunate that it is this manageable. Other sites you may try to scrape a video from might have to require you to go from Elements (in inspect element) to Network. There you'd have to try getting the snippets of embedded links and try downloading them all to make up the full video but once again, not always so easy but The video for the site you requested is.

YOUR ANSWER

Go to inspect element, click on Chromecast Player (2. Player) located at the top of the video to view the HTML attributes and finally click on the embed that should look like this

/inc/embed/embed.php?file=bobs.burgers.s09e05.flv&amp;hd=1&amp;pid=437035&amp;h=25424730eed390d0bb4634fa93a2e96c&amp;t=1618011716&amp;embed=cizgi

Once you've done that, click play, make sure inspect element is open, click the video to view the attributes (or ctrl+f to filter for <video>) and copy the src which should be

https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876

Now we can download it with python.

import requests
# - - This module helps to transfer information from 1 file to another 
import shutil

   
url = "https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876"

response = requests.get(url, stream=True)

if response.status_code == 200:
    # - - Make sure the file size is not 0
    response.raw.decode_content = True

    with open('bobs-burgers.mp4', 'wb') as f:
        #  - - Take the data from response.raw and transfer it to the file
        shutil.copyfileobj(response.raw, f)
    print('downloaded file')
else:
    print('Download failed')

Web Scraping Videos

2 Answers2

Background Information

(scroll all the way down for your answer)

YOUR ANSWER

Linked