1

I'm working on a Indeed web scraping project, and I'm facing an issue because I've found a tag without useful attribute to use. here the piece of html:

<div class="jobsearch-InlineCompanyRating icl-u-xs-mt--xs icl-u-xs-mb--md css-11s8wkw eu4oa1w0">
  <div class="icl-u-xs-hide">
    <div class="css-czdse3 eu4oa1w0"></div>
  </div>
  <div class="">
    <div data-company-name="true" class="css-czdse3 eu4oa1w0">
      <a href="https://it.indeed.com/cmp/Lutech-Spa?campaignid=mobvjcmp&amp;from=mobviewjob&amp;tk=1gqpia71uk99i800&amp;fromjk=ec58806e88002b72" target="_blank">Lutech Group</a>
    </div>
  </div>
  <div class="">
    <div class="icl-u-lg-block icl-u-xs-mr--xs">
      <div class="css-1unnuiz e37uo190">
        <div role="img" aria-label="37 recensioni" class="css-ln09g1 e1wrtnu61">
          <svg xmlns="http://www.w3.org/2000/svg" focusable="false" role="img" fill="currentColor" viewBox="0 0 24 24" aria-hidden="true" class="css-1xqhio eac13zx0">
            <path d="M12 18.698l6.125 3.696a.593.593 0 00.883-.642l-1.625-6.967 5.412-4.688a.593.593 0 00-.339-1.04l-7.124-.604-2.786-6.573a.593.593 0 00-1.092 0L8.668 8.453l-7.124.605a.593.593 0 00-.339 1.039l5.412 4.688-1.625 6.967c-.12.51.435.913.884.642L12 18.698z"></path>
          </svg>
          <svg xmlns="http://www.w3.org/2000/svg" focusable="false" role="img" fill="currentColor" viewBox="0 0 24 24" aria-hidden="true" class="css-1xqhio eac13zx0">
            <path d="M12 18.698l6.125 3.696a.593.593 0 00.883-.642l-1.625-6.967 5.412-4.688a.593.593 0 00-.339-1.04l-7.124-.604-2.786-6.573a.593.593 0 00-1.092 0L8.668 8.453l-7.124.605a.593.593 0 00-.339 1.039l5.412 4.688-1.625 6.967c-.12.51.435.913.884.642L12 18.698z"></path>
          </svg>
          <svg xmlns="http://www.w3.org/2000/svg" focusable="false" role="img" fill="currentColor" viewBox="0 0 24 24" aria-hidden="true" class="css-1xqhio eac13zx0">
            <path d="M12 18.698l6.125 3.696a.593.593 0 00.883-.642l-1.625-6.967 5.412-4.688a.593.593 0 00-.339-1.04l-7.124-.604-2.786-6.573a.593.593 0 00-1.092 0L8.668 8.453l-7.124.605a.593.593 0 00-.339 1.039l5.412 4.688-1.625 6.967c-.12.51.435.913.884.642L12 18.698z"></path>
          </svg>
          <div aria-hidden="true" class="css-1h39hw1 e1wrtnu60">
            <svg xmlns="http://www.w3.org/2000/svg" focusable="false" role="img" fill="currentColor" viewBox="0 0 24 24" class="css-1xqhio eac13zx0">
              <path fill-rule="evenodd" d="M12 16.249l4.157 2.51-1.103-4.73 3.675-3.184-4.834-.41L12 5.965l-1.895 4.47-4.834.41 3.675 3.184-1.103 4.73L12 16.248zm-6.124 6.145a.593.593 0 01-.884-.642l1.625-6.967-5.412-4.688a.593.593 0 01.339-1.04l7.124-.604 2.786-6.573a.593.593 0 011.092 0l2.786 6.573 7.124.605a.593.593 0 01.338 1.039l-5.411 4.688 1.625 6.967a.593.593 0 01-.883.642L12 18.698l-6.124 3.696z" clip-rule="evenodd"></path>
            </svg>
            <svg xmlns="http://www.w3.org/2000/svg" focusable="false" role="img" fill="currentColor" viewBox="0 0 24 24">
              <defs>
                <linearGradient id="ifl-StarRating-fill-5" x1="0" x2="100%" y1="0" y2="0">
                  <stop offset="50%" stop-color="currentColor"></stop>
                  <stop offset="50%" stop-color="transparent"></stop>
                </linearGradient>
              </defs>
              <path fill="url(#ifl-StarRating-fill-5)" d="M12 18.698l6.125 3.696a.593.593 0 00.883-.642l-1.625-6.967 5.412-4.688a.593.593 0 00-.339-1.04l-7.124-.604-2.786-6.573a.593.593 0 00-1.092 0L8.668 8.453l-7.124.605a.593.593 0 00-.339 1.039l5.412 4.688-1.625 6.967c-.12.51.435.913.884.642L12 18.698z"></path>
            </svg>
          </div>
          <svg xmlns="http://www.w3.org/2000/svg" focusable="false" role="img" fill="currentColor" viewBox="0 0 24 24" aria-hidden="true" class="css-1xqhio eac13zx0">
            <path fill-rule="evenodd" d="M12 16.249l4.157 2.51-1.103-4.73 3.675-3.184-4.834-.41L12 5.965l-1.895 4.47-4.834.41 3.675 3.184-1.103 4.73L12 16.248zm-6.124 6.145a.593.593 0 01-.884-.642l1.625-6.967-5.412-4.688a.593.593 0 01.339-1.04l7.124-.604 2.786-6.573a.593.593 0 011.092 0l2.786 6.573 7.124.605a.593.593 0 01.338 1.039l-5.411 4.688 1.625 6.967a.593.593 0 01-.883.642L12 18.698l-6.124 3.696z" clip-rule="evenodd"></path>
          </svg>
        </div>
        <span class="css-xvmbeo e1wnkr790">
          <a href="https://it.indeed.com/cmp/Lutech-Spa/reviews?campaignid=mobvjcmp&amp;cmpratingc=mobviewjob&amp;from=mobviewjob&amp;tk=1gqpia71uk99i800&amp;fromjk=ec58806e88002b72&amp;jt=Data+analyst" target="_blank" class="css-picdch emf9s7v0">37 recensioni</a>
        </span>
      </div>
    </div>
  </div>
</div>
<div class>
  <div>Bergamo, Lombardia</div>
</div>
<div class></div>

as you can see, at the bottom of the HTML piece there is a div class that has no value. I need to extract the text (that in this case is Bergamo, Lombardia).

I've Tried some lines of code:

Loc = C_N.find_element(By.XPATH, '//div[2]/div') #not working
Loc = C_N.find_element(By.XPATH, '//div[@class = ""]') #gives me all the text, not only what I need
Loc = C_N.find_element(By.XPATH, '//*[@id="jobsearch-ViewjobPaneWrapper"]/div/div/div/div[2]/div/div/div[1]/div/div[1]/div[1]/div[1]/div[2]/div/div/div/div[2]/div') #this one works only sometimes and is not a good practice.

I'm not sure how to use other solutions like By.ID, By.CLASS_NAME, By.TAG_NAME or even By.CSS_SELECTOR ...

PS: For the full HTML code simply search for the indeed.com page and search for one job

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352

2 Answers2

1

To extract the text Bergamo, Lombardia ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.jobsearch-InlineCompanyRating +div > div"))).text)
    
  • Using XPATH and get_attribute("innerHTML"):

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'jobsearch-InlineCompanyRating')]//following-sibling::div[@class]/div"))).get_attribute("innerHTML"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Use the below XPath expression:

(//div[contains(@class, 'jobsearch-InlineCompanyRating')]//following-sibling::div[@class]/div)[3]

Full code which should print the text Bergamo, Lombardia

print(driver.find_element(By.XPATH, "(//div[contains(@class, 'jobsearch-InlineCompanyRating')]//following-sibling::div[@class]/div)[3]").get_attribute("innerHTML"))
Shawn
  • 4,064
  • 2
  • 11
  • 23