0

I'm using Xpath and trying to find all td's with a div class name of 'day' excluding those with a td class name of 'invalid_day'.

This is for a calendar using selenium Xpath to select only the matching div class 'day' in the selected month. Ignoring any div class 'day' from the previous or future month.

HTML

<tbody>
  <tr>
    <td class="invalid_day">
      <div class="day">29</div>
    <td class="invalid_day">
      <div class="day">30</div>
    <td class="invalid_day">
      <div class="day">31</div>
    <td>
      <div class="day">1</div>
    <td>
      <div class="day">2</div>
    <td>
      <div class="day">3</div>
    <td>
      <div class="day">4</div>
  </tr>
  <tr>
  #removed <td> 5 - 31 for brevity
    <td class="invalid_day">
      <div class="day">1</div>
  </tr>

After searching the forum I have tried quite few approaches and all get the td's with div class='day'.
None however have been successful in filtering out those td's with a td class='invalid_day'

Code tried:

.find_elements_by_xpath('//td[./div[@class="day"]]')

Returns: 29,30,31,1,2,3,...31,1

Code tried:

.find_elements_by_xpath('//td[./div[@class="day"] and not[@class="invalid_day"]]')

Returns: empty

Also tried the css_selector method with:

.find_elements_by_css_selector('.day:not(.invalid_day)')

Returns: 29,30,31,1,2,3,...31,1

Results I am looking for: 1,2,3,...31

Thanks in advance!

Optionwiz
  • 326
  • 1
  • 4
  • 13

4 Answers4

0

You can try this:

driver.find_elements_by_css_selector("td:not(.invalid_day)>div.day")
Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Metalgear
  • 3,391
  • 1
  • 7
  • 16
0

The following XPath-1.0 expression should do the job:

.find_elements_by_xpath('//div[@class="day" and not(../@class="invalid_day")]')

Output is:

1,2,3

To get the <td> elements, you can simply append a /.. to the XPath or use the following expression:

.find_elements_by_xpath('//td[./div/@class="day" and not(@class="invalid_day")]')
zx485
  • 28,498
  • 28
  • 50
  • 59
0

To find all the <div> tags with class as day excluding those with a parent <td> with classinvalid_day i.e. 1,2,3,...31, you can use either of the following based Locator Strategies:

  • xpath 1: Ignoring elements with parent class invalid_day

    //td[not(@class='invalid_day')]//div[@class='day']
    

Snapshot:

ignoring_invalid_day_class

  • xpath 2: Ignoring elements with any parent class

    //td[not(@class)]//div[@class='day']
    

Snapshot:

ignoring_class


Reference

You can find a couple of relevant discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

You can do it with bs4.

from bs4 import BeautifulSoup
import requests

response = requests.get(URL)
soup = BeautifulSoup(response.text, "lxml")

divs = soup.findAll("div", class_ = "day")

And then to get only the text you can do a .text for every item in the list.

Andrej
  • 2,743
  • 2
  • 11
  • 28