0

I'm still new in web scraping and I have this question related to Webdriver.

Code Exemple :

<table>
    <tbody>
        <tr>
            <td> car </td>
            <td> bus </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & train </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & plane </td>
        </tr>
    </tbody>
</table>

<table>
    <tbody>
        <tr>
            <td> food </td>
            <td> meat</td>
        </tr>
       <tr>
            <td> drink </td>
            <td> water </td>
        </tr>
    </tbody>
</table>

So the idea is that in my original code, I have multiple tables with the same ID and class names.

Question : How can i proceed to extract all the TRs that contains the word "bus".

I can't find the correct xpath syntax to use.

Hadj
  • 11
  • 4

3 Answers3

1

To create a list of all the <tr> with their child <td> containing the text bus you can use the following based Locator Strategies:

elements = driver.find_elements_by_xpath("//tr[.//td[contains(., 'bus')]]")

Ideally you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr[.//td[contains(., 'bus')]]")))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Use beautifulsoup

html = "<table>
    <tbody>
        <tr>
            <td> car </td>
            <td> bus </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & train </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & plane </td>
        </tr>
    </tbody>
</table>

<table>
    <tbody>
        <tr>
            <td> food </td>
            <td> meat</td>
        </tr>
       <tr>
            <td> drink </td>
            <td> water </td>
        </tr>
    </tbody>
</table>"
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

temp = soup.findAll("td") 

output = [x for x in temp if "bus" in x.text]
0
//td[contains(text(),'bus')]

you can use contains text , this gives all td that has bus in it

PDHide
  • 18,113
  • 2
  • 31
  • 46