0

I am making a basic Web Crawler/Spider with Python. I am trying to crawl through a YouTube channel and print all the titles of the videos on it but it never returns anything.

Here is my code so far:

import requests
from bs4 import BeautifulSoup

url = 'https://www.youtube.com/c/DanTDM/videos'

response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

x = soup.select(".yt-simple-endpoint style-scope ytd-grid-video-renderer")

print(x)

And the output is always: []. An empty list (which means it didn't find anything). I need to know what I'm doing wrong.

2 Answers2

1

The code seems correct.

Call print(response.text) and see if YouTube is returning you a blocking page.

Anti scraping measures can be in action, as checking your user agent, etc.

Jonathan Ciapetti
  • 1,261
  • 3
  • 11
  • 16
  • 1
    yeah, could also be that some javascript is running client side that isnt being picked up. See related question. https://stackoverflow.com/questions/26393231/using-python-requests-with-javascript-pages – bartius Jul 18 '22 at 01:24
  • 1
    No no, I see what's happening. My response was HTML, of course and it was the 'Before you continue to YouTube' page. I would need to use Selenium and WebDriver to automate it and then maybe run the other code or make a different Web Crawler using Selenium. – blackscratch22 Jul 18 '22 at 09:14
  • Yes, with "blocking page" I meant also a dialog with dynamic content. No wonder why YouTube made the dialog that way ... – Jonathan Ciapetti Jul 21 '22 at 01:11
1

Browser Automation with Selenium

When I send a request to YouTube, I receive the following page:

(A 'Before you continue to Youtube' page).

So...

We should use Selenium instead as we need to click one of the buttons. I don't think we can interact with the website using the requests module.

Selenium allows you to have control over your browser. Read the documentation!