How to get specific data using BeautifulSoup

Question

I'm not sure how to get a specific result from this:

<div class="videoPlayer">
    <div class="border-radius-player">
        <div id="allplayers" style="position:relative;width:100%;height:100%;overflow: hidden;">
            <div id="box">
                <div id="player_content" class="todo" style="text-align: center; display: block;">
                     <div id="player" class="jwplayer jew-reset jew-skin-seven jw-state-paused jw-flag-user-inactive" tabindex="0">
                         <div class="jw-media jw-reset">
                              <video class="jw-video jw-reset" x-webkit-playsinline="" src="https:EXAMPLE-URL-HERE" preload="metadata"></video>
                         </div">

How would I get the src in <video class="jw-video jw-reset" x-webkit-playsinline="" src="https:EXAMPLE-URL-HERE" preload="metadata"></video>

This is what I've tried so far:

import urllib.request
from bs4 import BeautifulSoup

url = "https://someurlhere"

a = urllib.request.Request(url, headers={'User-Agent' : "Cliqz"})
b = urllib.request.urlopen(a) # prevent "Permission denies"

soup = BeautifulSoup(b, 'html.parser')

for video_class in soup.select("div.videoPlayer"):
    print(video_class.text)

Which returns parts of it but not down to video class

Requests only download a static webpage and are unable to deal with javascript code. Can you do a simple string search in b to make sure that the elements you need exists in the html code? — Simas Joneliunas, Jul 02 '18 at 00:40
It doesn't, it goes down to `box` but I thought that BeautifulSoup would be able to handle that. — HelloThereToad, Jul 02 '18 at 00:47
Try `for video_class in soup.select("div.videoPlayer video.jw-video.jw-reset"): print(video_class.attrs['src'])` — The fourth bird, Jul 02 '18 at 07:34

score 1 · Accepted Answer · answered Jul 02 '18 at 00:59

Requests is a simple html client, it cannot execute javascripts.

You have three more options to try here though!

try going over the html source (b) and see if any of the javascripts in the site have the data you need. usually, the page would have the url (which, i assume you want to scrape) in some sort of holder (a javascript code or a json object) that you can scrape off.
Try looking at the XHR requests of the site and see if any of the requests query external sources for the video data. In this case, see if you can imitate that request to get the data you need.
(last resort) You need to use a phantomjs + selenium browser to download the website (Link1, Link2). You can find out more about how to use selenium in this SO post: https://stackoverflow.com/a/26440563/3986395

How to get specific data using BeautifulSoup

1 Answers1