0

I am new to Python, and coding in general. I am currently trying to learn webscraping. I originally was going to use beautiful soup but decided to use selenium for no real reason.

I am working on a web scraper using Python and Selenium to extract anime data from MyAnimeList. However, I I encountered an issue where the scraper gets stuck when it opens the first link and doesn't proceed to scrape the title, score, or genre of the selected anime.

This is my code:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 5)

links = []
for page in range(0, 100, 50):
    driver.get('https://myanimelist.net/topanime.php?limit=' + str(page))
    link = driver.find_elements(By.CSS_SELECTOR, 'div[class="detail"] h3 a')
    for item in link:
        links.append(item.get_attribute('href'))

titles = []
ratings = []
genres = []
for item_link in links:
    driver.get(item_link)
    title = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div[class="h1-title"]')))
    rating = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div[class="score-label"]')))
    genre = wait.until(EC.visibility_of_element_located((By.XPATH, 'div[class="//*[contains(text(), "Genre")]/parent::div')))
    titles.append(title.text)
    ratings.append(rating.text)
    genres.append(genre.text)

my_data = {'Titles': titles, 'Ratings': ratings, 'Genres': genres}
df = pd.DataFrame(my_data)

csv_file_path = 'C:/Users/maldo/Desktop/anime/AnimeList.csv'
df.to_csv(csv_file_path, index=False)

This is the error that appers:

Traceback (most recent call last):
  File "C:\Users\maldo\PycharmProjects\MAL-WebScraper\main.py", line 23, in <module>
    rating = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div[class="score-label"]')))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\maldo\PycharmProjects\MAL-WebScraper\venv\Lib\site-packages\selenium\webdriver\support\wait.py", line 95, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 
Stacktrace:
Backtrace:
    GetHandleVerifier [0x0045A813+48355]
    (No symbol) [0x003EC4B1]
    (No symbol) [0x002F5358]
    (No symbol) [0x003209A5]
    (No symbol) [0x00320B3B]
    (No symbol) [0x0034E232]
    (No symbol) [0x0033A784]
    (No symbol) [0x0034C922]
    (No symbol) [0x0033A536]
    (No symbol) [0x003182DC]
    (No symbol) [0x003193DD]
    GetHandleVerifier [0x006BAABD+2539405]
    GetHandleVerifier [0x006FA78F+2800735]
    GetHandleVerifier [0x006F456C+2775612]
    GetHandleVerifier [0x004E51E0+616112]
    (No symbol) [0x003F5F8C]
    (No symbol) [0x003F2328]
    (No symbol) [0x003F240B]
    (No symbol) [0x003E4FF7]
    BaseThreadInitThunk [0x74F97D59+25]
    RtlInitializeExceptionChain [0x76FEB79B+107]
    RtlClearBits [0x76FEB71F+191]

I have tried specyfing other CSS element locations but to no avail. I also tried specifying the chromedriver.exe location but that actually broke my code... I was hoping to store the anime data in three columns on a csv file, those being title, score, adn genres.

1 Answers1

0

Actually the element is:

<div class="score-label score-9">9.10</div>

So to identify the element you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    rating = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.score-label')))
    
  • Using XPATH:

    rating = wait.until(EC.visibility_of_element_located((By.XPATH, '//div[starts-with(@class, "score-label")]')))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352