2

new to beautiful soup 4, and I can't get this simple code to pick up the contents of the tags when I search something on YouTube. When I print containers, it just prints "[]" as an empty variable I assume. Any ideas why this is not picking anything up? Does this have to do with not grabbing the right tag on YouTube? In the search HTML there is the following tag for one result:

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="Kendrick Lamar - HUMBLE. by KendrickLamarVEVO 5 months ago 3 minutes, 4 seconds 322,571,817 views" href="https://www.youtube.com/watch?v=tvTRZJ-4EyI" title="Kendrick Lamar - HUMBLE.">
                Kendrick Lamar - HUMBLE.
              </a>

Python code:

import bs4

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

search = "damn"
my_url = "https://www.youtube.com/results?search_query=" + search
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

containers = page_soup.find_all("a",{"id":"video-title"})
print(containers)

#result-count

2 Answers2

2

In youtube page load result dynamically so the id and classes name will be change. when you try to make parse to page sure you read page source when you load it in urllib not in browser see that code it's will solve your problem:

from bs4 import BeautifulSoup as bs
from urllib.request import *
page = urlopen('https://www.youtube.com/results?search_query=damn').read()
soup = bs(page,'html.parser')
results = soup.find_all('a',{'class':'yt-uix-sessionlink'})
for link in results:
    print(l.get("href"))

the code will show all url in page so you should parsing it also.

Ahmad Nourallah
  • 337
  • 1
  • 9
1

If you check the source code of the url you cant find any id="video-title" this means that this page is loading the content dynamically. BeautifulSoup does not support dynamic loading by it self. try to combine it with something else like selenium or scrapyjs and also this post may be helpful

Reza
  • 141
  • 1
  • 12