Why is find_all BeautifulSoup4 function returning nothing?

Question

new to beautiful soup 4, and I can't get this simple code to pick up the contents of the tags when I search something on YouTube. When I print containers, it just prints "[]" as an empty variable I assume. Any ideas why this is not picking anything up? Does this have to do with not grabbing the right tag on YouTube? In the search HTML there is the following tag for one result:

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="Kendrick Lamar - HUMBLE. by KendrickLamarVEVO 5 months ago 3 minutes, 4 seconds 322,571,817 views" href="https://www.youtube.com/watch?v=tvTRZJ-4EyI" title="Kendrick Lamar - HUMBLE.">
                Kendrick Lamar - HUMBLE.
              </a>

Python code:

import bs4

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

search = "damn"
my_url = "https://www.youtube.com/results?search_query=" + search
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

containers = page_soup.find_all("a",{"id":"video-title"})
print(containers)

#result-count

Works fine here. Did you check that `page_html` contained what you expected? (Also, `page_soup.find(id='video-title')` would be simpler.) — Ry-, Sep 23 '17 at 07:06
There seems no `` with `id="video-title"` in page_html, if you want the results of the page use `page_soup.find_all('a', {'class': ' yt-uix-sessionlink spf-link '})`. — Bijoy, Sep 23 '17 at 07:22

score 2 · Answer 1 · answered Sep 23 '17 at 07:52

In youtube page load result dynamically so the id and classes name will be change. when you try to make parse to page sure you read page source when you load it in urllib not in browser see that code it's will solve your problem:

from bs4 import BeautifulSoup as bs
from urllib.request import *
page = urlopen('https://www.youtube.com/results?search_query=damn').read()
soup = bs(page,'html.parser')
results = soup.find_all('a',{'class':'yt-uix-sessionlink'})
for link in results:
    print(l.get("href"))

the code will show all url in page so you should parsing it also.

Awesome, I'll check this and report back – douglasrcjames_old Sep 26 '17 at 00:01 — douglasrcjames_old, Sep 26 '17 at 00:01

Reza · Accepted Answer · 2017-09-23T07:28:12.910

1

If you check the source code of the url you cant find any id="video-title" this means that this page is loading the content dynamically. BeautifulSoup does not support dynamic loading by it self. try to combine it with something else like selenium or scrapyjs and also this post may be helpful

edited Sep 23 '17 at 07:28

answered Sep 23 '17 at 07:18

Reza

141
1
12

I had a feeling the html might be dynamic, thanks for the references, I will check them out! – douglasrcjames_old Sep 23 '17 at 07:25

Why is find_all BeautifulSoup4 function returning nothing?

2 Answers2