0

I am very lost with what kind of find method to use and the Xpath to get all the elements in the site that has class="wordlist-item" in them

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep

insectNames = [] 
targetSite = "https://www.enchantedlearning.com/wordlist/insect.shtml"

browser = webdriver.Chrome(executable_path=r"C:\chromedriver\chromedriver.exe")
browser.get(targetSite)

bugName = browser.find_elements_by_xpath("//div[@class="wordlist-item"]")
print(bugName)
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Kooboi
  • 61
  • 1
  • 1
  • 3

3 Answers3

0

Pay very close attention to the format of any function you want to use. The following line produces an error because you need to use single quotes around 'wordlist_item'.

bugName = browser.find_elements_by_xpath("//div[@class="wordlist-item"]")

The reason for this is you are passing a string "//div[@class="wordlist-item"]" and when the program encounters the first set of quotes it will stop reading at the second set, then it starts again at the third set and ends at the fourth, leaving you with 3 values instead of 1.

"//div[@class=" wordlist-item and "]"

Besides that, using just print(bugName) returns you the browser you used, session and element id. If you want to access the text of the element you have to use print(bugName.text) however since in this case you are getting multiple elements in a list, you have to loop through the list and individually print out the name.

from selenium import webdriver

url = "https://www.enchantedlearning.com/wordlist/insect.shtml"

browser = webdriver.Chrome(executable_path=r"C:\chromedriver\chromedriver.exe")
browser.get(url)

bugName = browser.find_elements_by_xpath("//div[@class='wordlist-item']")

for bug in bugName:
    print(bug.text)

As a side note, if all you want to do is scrape data from the internet I'd recommend you check out BeautifulSoup. It's a little more intuitive to use and you won't have to open up a browser every time you run your program. Selenium is better suited for when you need to interact with the page.

Andrew Stone
  • 1,000
  • 1
  • 7
  • 17
0
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep


targetSite = "https://www.enchantedlearning.com/wordlist/insect.shtml"

browser = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe')
browser.get(targetSite)
bugNames = browser.find_elements_by_xpath("//div[@class=\"wordlist-item\"]")
for bugname in bugNames:
    print(bugname.text)

Output:-

admiral butterfly
ambush bug
ant
aphid
armyworm
assassin bug
atlas moth
backswimmer
bedbug
bee
beetle
blue morpho butterfly
bluet
borer
brown butterfly
buckeye butterfly
bug
bumblebee
butterfly
carpenter ant
caterpillar
chrysalis
cicada
cockroach
comma butterfly
copper butterfly
crane fly
cricket
cutworm
damselfly
darkling beetle
dragonfly
dung beetle
earwig
egg...
Abhishek Rai
  • 2,159
  • 3
  • 18
  • 38
0

To extract all the insect names* using Selenium and you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.enchantedlearning.com/wordlist/insect.shtml')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.wordlist-item")))])
    
  • Using XPATH:

    driver.get('https://www.enchantedlearning.com/wordlist/insect.shtml')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='wordlist-item']")))])
    
  • Console Output:

    ['admiral butterfly', 'ambush bug', 'ant', 'aphid', 'armyworm', 'assassin bug', 'atlas moth', 'backswimmer', 'bedbug', 'bee', 'beetle', 'blue morpho butterfly', 'bluet', 'borer', 'brown butterfly', 'buckeye butterfly', 'bug', 'bumblebee', 'butterfly', 'carpenter ant', 'caterpillar', 'chrysalis', 'cicada', 'cockroach', 'comma butterfly', 'copper butterfly', 'crane fly', 'cricket', 'cutworm', 'damselfly', 'darkling beetle', 'dragonfly', 'dung beetle', 'earwig', 'egg', 'fire ant', 'firefly', 'flea', 'fly', 'fritillary butterfly', 'fruit fly', 'gnat', 'gossamer-winged butterfly', 'grasshopper', 'green darner dragonfly', 'ground beetle', 'grub', 'gypsy moth', 'hairstreak butterfly', 'harlequin bug', 'honeybee', 'hornet', 'horse fly', 'house fly', 'hover fly', 'imago', 'insect', 'Japanese beetle', 'Julia butterfly', 'jumping bean', 'June bug', 'katydid', 'kissing bug', 'lacewing', 'ladybug', 'larva', 'leafcutter ant', 'leafhopper', 'lice', 'lightning bug', 'locust', 'longhorn beetle', 'louse', 'luna moth', 'maggot', 'mantid', 'mantis', 'mayfly', 'meadowhawk', 'mealworm', 'metalmark butterfly', 'metamorphosis', 'midge', 'milkweed bug', 'monarch', 'morpho', 'mosquito', 'moth', 'nymph', 'Oregon silverspot butterfly', 'owl butterfly', 'painted lady butterfly', 'paper wasp', 'planthopper', 'pond skater', 'praying mantid', 'praying mantis', 'pupa', "Queen Alexandra's birdwing butterfly", 'roach', 'robber fly', 'scarab', 'silkworm', 'silverfish', 'skipper', 'snout butterfly', 'spittlebug', 'springtail', 'stag beetle', 'stink bug', 'stonefly', 'sulphur butterfly', 'swallowtail butterfly', 'termite', 'thrip', 'tiger beetle', 'tiger moth', 'tsetse fly', 'Ulysses butterfly', 'viceroy butterfly', 'walking stick', 'wasp', 'water boatman', 'water bug', 'water strider', 'weevil', 'wood nymph butterfly', 'wood-borer', 'woolly bear caterpillar', 'yellow-white butterfly', 'yellowjacket', 'zebra longwing butterfly', 'zebra swallowtail butterfly']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352