19

I am trying to use Python to web scrape a website that loads it's HTML dynamically by using embedded javascript files that render the data as a Response into the HTML. Therefore, if I use BeautifulSoup alone, I will not be able to retrieve that data that I need as my program will scrape it before the Javascript loads the data. Due to this, I am integrating the selenium library into my code, to make my program wait until a certain element is found before it scrapes the website.

I had originally done this:

element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.ID, "tabla_evolucion")))

But I want to specify a class instead by doing something like:

element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope")))  

Here is the rest of my code:

driver_path = 'C:/webDrivers/chromedriver.exe'
driver = webdriver.Chrome(executable_path=driver_path)
driver.header_overrides = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
url = "myurlthatIamscraping.com" 
response = driver.get(url)
html = driver.page_source
characters = len(html)
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope")))

print(html)
print(characters)
time.sleep(10)
driver.quit()

It is not working for me and I can not find the right syntax anywhere.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Rishi Vadaga
  • 193
  • 1
  • 1
  • 6
  • 1
    Can you post your html source & your python example for quicker response. – Sureshmani Kalirajan Jul 29 '19 at 22:50
  • driver_path = 'C:/webDrivers/chromedriver.exe' driver = webdriver.Chrome(executable_path=driver_path) driver.header_overrides = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'} url = "https://www.myurlthatIamscraping.com" response = driver.get(url) html = driver.page_source characters = len(html) element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope"))) print(html) print(characters) time.sleep(10) driver.quit() – Rishi Vadaga Jul 29 '19 at 23:07
  • Sorry I don't know how to format the code on stack – Rishi Vadaga Jul 29 '19 at 23:14
  • If you can post the html source for the element that you are looking, will probably get you better answer. Are you looking to wait for any specific element on the page? then you can wait using any element locator - id, class or xpath, etc. – Sureshmani Kalirajan Jul 29 '19 at 23:43

4 Answers4

17

The relevant HTML would have helped us to construct a more canonical answer. However to start with your first line of code:

element = WebDriverWait(driver,100).until(EC.presence_of_element_located(
  (By.ID, "tabla_evolucion")))

is pretty much legitimate where as the second line of code:

element = WebDriverWait(driver,100).until(EC.presence_of_element_located(
  (By.class, "ng-binding ng-scope")))

Will raise an error as:

Message: invalid selector: Compound class names not permitted

as you can't pass multiple classes through By.class.

You can find a detailed discussion in Invalid selector: Compound class names not permitted using find_element_by_class_name with Webdriver and Python


Solution

You need to take care of a couple of things as follows:

  • Without any visibility to your usecase, functionally inducing WebDriverWait in association with EC as presence_of_element_located() merely confirms the presence of the element within the DOM Tree. Presumably moving ahead either you need to get the attributes e.g. value, innerText, etc or you would interact with the element. So instead of presence_of_element_located() you need to use either visibility_of_element_located() or element_to_be_clickable()

You can find a detailed discussion in WebDriverWait not working as expected

  • For an optimum result you can club up the ID and CLASS attributes and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

  element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located(
    (By.CSS_SELECTOR, ".ng-binding.ng-scope#tabla_evolucion")))
  • Using XPATH:
  element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located(
    (By.XPATH, "//*[@class='ng-binding ng-scope' and @id='tabla_evolucion']")))
dfrankow
  • 20,191
  • 41
  • 152
  • 214
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
12

It's in the docs.

Set of supported locator strategies.
CLASS_NAME = 'class name'
CSS_SELECTOR = 'css selector'
ID = 'id'
LINK_TEXT = 'link text'
NAME = 'name'
PARTIAL_LINK_TEXT = 'partial link text'
TAG_NAME = 'tag name'
XPATH = 'xpath'

Note: What you have in your code is not a class, it's two classes. That won't work if you use By.CLASS_NAME() because it expects only a single class. What you want instead is a CSS selector

EC.presence_of_element_located((By.CSS_SELECTOR, ".ng-binding.ng-scope")))

In CSS selector syntax, a . indicates a class. See the W3C docs for more info on the CSS selector syntax.

JeffC
  • 22,180
  • 5
  • 32
  • 55
0

I have a solution try this- change your class class to CLASS_NAME

element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.CLASS_NAME , "ng-binding ng-scope")))
  • `CLASS NAME` does not work if you have space in between the class name, `CSS_SELECTOR` will work with `.` instead of spaces, Read the selected answer again – Prakash Dahal Apr 05 '23 at 15:49
0

try the below :

element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.CLASS_NAME, "ng-binding")))

or

element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.CLASS_NAME, "ng-scope")))

One thing to mention, you are trying to pass two class names i.e ng-binding is one class and ng-scope is another.

Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35
Atique
  • 45
  • 6