1

If this code below scrapes the first company name, IBM from a table, how would I code it to scrape all the company names from the first column in the table?

Pertinent Code:

table = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#gridview-1070-record-2989')))

For instance, the next one I need is #gridview-1070-record-2990 and so on.

Current Result:

IBM

Desired Results:

IBM
Microsoft Corporation
Apple Corporation
Google
Tesla
etc.

Full Code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
import pandas as pd

   
options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
ser = Service("./chromedriver.exe")
browser = driver = webdriver.Chrome(service=ser)

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
  "source": """
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    })
  """
})
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
wait = WebDriverWait(driver, 30)
driver.get("https://stockrover.com")
wait.until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]/div/section[2]/div/ul/li[2]"))).click()
user = driver.find_element(By.NAME, "username")
password = driver.find_element(By.NAME, "password")
user.clear()
user.send_keys("vibajajo64")
password.clear()
password.send_keys("vincer64")
driver.find_element(By.NAME, "Sign In").click()
wait = WebDriverWait(driver, 30)


table = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#gridview-1070-record-2989')))
for tab in table:
  print(tab.text)
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
J R
  • 185
  • 1
  • 1
  • 12
  • You seem to have posted your user name and access key in your code. I suggest you delete your post, change your keys, remove your keys from the code and post the question again. – petezurich Feb 27 '22 at 16:50
  • 1
    Thanks, but it's just a dummy account with phony information. – J R Feb 27 '22 at 17:07

2 Answers2

1

To extract and print the texts e.g. IBM, Microsoft Corporation, etc from all of the <table> elements within the website stockrover, instead of presence_of_all_elements_located() you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table[id^='gridview-1070-record']")))])
    
  • Using XPATH:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[starts-with(@id, 'gridview-1070-record')]")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

You can put the pertinent code in a for loop and then format the string inputted to search according to an index like so

table = []
for i in range(2989,"""the number of the last record you need"""):
table.append(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#gridview-1070-record-{}'.format(i)))))

this will give you an array of the companies

  • Hi, I'm put your code in my initial post. I getting some error when I print. Tried a couple things but nothing has worked. Any idea? AttributeError: 'list' object has no attribute 'text' – J R Feb 27 '22 at 17:26
  • did you try without the text attribute? just tab not tab.text – Ranger 4860 Feb 27 '22 at 17:37
  • I just thought of it looks like you are still referencing the list try to make another for loop then try to get the text attribute. Make the foor loop nested – Ranger 4860 Feb 27 '22 at 17:58