2

I am trying to collect data from youtube search result. Search term is "border collie" with a filter for videos that were uploaded "Today".

52 videos appear in the search result. However, when I try to parse the page, I only got 20 entries. How do I parse all 52 videos? Any suggestions is appreciated.

P.S. I tried this post for the infinitive page, but it didn't work for youtube.

Current code:

url = 'https://www.youtube.com/results?search_query=border+collie&sp=EgIIAg%253D%253D'
driver = webdriver.Chrome()
driver.get(url)

#waiting for the page to load
sleep(3) 
#repeat scrolling 10 times
for i in range(10):
    #scroll 1000 px
    driver.execute_script('window.scrollTo(0,(window.pageYOffset+1000))')
    sleep(3) 

response = requests.get(url)
soup = bs(response.text,'html.parser',from_encoding="UTF-8")

source_list = []
duration_list = []

#Scrape source of the video
vids_source = soup.findAll('div',attrs={'class':'yt-lockup-byline'})
for i in vids_source:
    source = i.text
    source_list.append(source)

#Scrape video duration
vids_badge = soup.findAll('span',attrs={'class':'video-time'})
for i in vids_badge:
    duration = i.text
    duration_list.append(duration)
cya2017
  • 25
  • 5
  • 1
    can you [edit] the question and explain what didn't work with the code you added in your question and the linked answer?. Please add more details about the problems you're facing. – Mauricio Arias Olave Nov 02 '19 at 23:32
  • @MauricioAriasOlave. Thanks for letting me that the original question was not specific enough. Hope the updated version is clearer. – cya2017 Nov 02 '19 at 23:58

1 Answers1

1

I think you are confusing requests and selenium. Requests module can be used to download and scrape without the use of browser. For your requirement, to scroll down and get more results, use Selenium alone and scrape the results using DOM locators like XPATH.

source_list = []
duration_list = []
for i in range(10):
    #scroll 1000 px
    driver.execute_script('window.scrollTo(0,(window.pageYOffset+1000))')
    sleep(3)
    elements = driver.find_elements_by_xpath('//div[@class = "yt-lockup-byline"]')
    for element in elements:
        source_list.append(element.text)
    elements = driver.find_elements_by_xpath('//span[@class = "video-time"]')
    for element in elements:
        duration_list.append(element.text)

So we scroll first and get all the elements text. Scroll again and get all the elements again and so on.. No need to use requests when scraping like this.

Naveen
  • 770
  • 10
  • 22
  • tested out script above but both duration_list and source_list return empty list. Am testing out API, hopefully I can collect all needed data with it. – cya2017 Nov 03 '19 at 14:46