-1

I am trying to extract some information from the IMDB website I am extracting the information and writing it to a CSV file. when I am trying to find an element which is not present it is getting stuck.

Here is my code:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import os
import csv


profile = webdriver.ChromeOptions()
profile.add_experimental_option(
"prefs", {'download.default_directory': '/Users/aravind/tekie/ml-project/scrapper-opensubs/subs',
                'download.prompt_for_download': False})
driver = webdriver.Chrome(
    executable_path='/Users/aravind/chromedriver')
web = 'https://www.imdb.com/search/title?genres=animation&explore=title_type,genres&title_type=movie&ref_=adv_explore_rhs'
driver.get(web)
driver.implicitly_wait(2000)
with open('./movies.csv', mode='w') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(['Movie-Title','Rating','Meta-Score','Cast','Votes','Gross'])
    for page in range(0,1):
        print('...crawling started')
        list_of_names = driver.find_elements_by_class_name('lister-item-content')
        for index in range(0,len(list_of_names)):
            if list_of_names[index].find_elements_by_class_name('lister-item-header'):
                title = list_of_names[index].find_elements_by_class_name(
                'lister-item-header')[0].find_elements_by_tag_name('a')[0].text.strip()
            else:
                title="NA"
            if list_of_names[index].find_elements_by_class_name('ratings-imdb-rating'):
                rating = list_of_names[index].find_elements_by_class_name(
                'ratings-imdb-rating')[0].text.strip()
            else:
                rating = "NA"
            if list_of_names[index].find_elements_by_class_name('ratings-metascore'):
                metaScore = list_of_names[index].find_elements_by_class_name(
                    'ratings-metascore')[0].find_elements_by_tag_name('span')[0].text.strip()
            else:
                metaScore = "NA"
            if list_of_names[index].find_elements_by_tag_name('p')[2]:
                cast = list_of_names[index].find_elements_by_tag_name('p')[2].text.strip()
            else:
                cast="NA"
            if list_of_names[index].find_elements_by_class_name('sort-num_votes-visible')[0]:
                votes = list_of_names[index].find_elements_by_class_name(
                    'sort-num_votes-visible')[0].find_elements_by_tag_name('span')[1].text.strip()
            else:   
                votes="NA"
            if list_of_names[index].find_elements_by_class_name('sort-num_votes-visible')[0]:
                gross = list_of_names[index].find_elements_by_class_name(
                    'sort-num_votes-visible')[0].find_elements_by_tag_name('span')[4].get_attribute('data-value').strip()
            else:
                gross="NA"
            print('done',index)
            writer.writerow([title,rating,metaScore,cast,votes,gross])

I even tried try except but it didn't work. how to handle no data_case?

aravind_reddy
  • 5,236
  • 4
  • 23
  • 39

1 Answers1

3

The reason for the "get stuck" part is the driver.implicitly_wait(2000) part - the webdriver waits for 2000 seconds before timing out (cca 33 minutes).

This happens each time find_elements_by_class_name does not find anything (e.g. it is not there).

Vafliik
  • 481
  • 3
  • 11
  • so how to get around it – aravind_reddy Oct 02 '18 at 14:26
  • 1
    @aravind_reddy change the value to something smaller (e.g. 2) - the missing elements will be skipped immediately. Please note that after that you will get some `list index out of range` errors, as you are checking list items that may not be there . But that should be quite easy to debug, once you get to the errors. – Vafliik Oct 02 '18 at 14:37
  • can you show to how to change the implicitly wait time when element not found each time – aravind_reddy Oct 02 '18 at 14:39
  • 1
    @aravind_reddy simply change the line `driver.implicitly_wait(2000)` to `driver.implicitly_wait(2)` :) – Vafliik Oct 02 '18 at 14:53