0

I'm trying to scrape some profiles of people in linkedin from a specific job. To do this I was trying to find the people button and click it to specifically look at the relevant people.

The path is as follows:

From signed out Linkedin home -> I sign in and go to LinkedIn home -> I write in the search bar "hr" and hit enter.

In the result page of hr, on the left side of the page, there is a navigation list that says "On this page". One of the options includes "People" and that is what I want to target.

The link to the page is: https://www.linkedin.com/search/results/all/?keywords=hr&origin=GLOBAL_SEARCH_HEADER&sid=Xj2

The HTML of the button for 'People' in the navigation list is:

<li>
   <button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People

I have tried to find this button through By.Link_text and found the keyword People but did not work. I have also tried to do By.XPATH "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")"" but it also does not find it.

How can I make selenium find this custom attribute so I can find this button through data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ=="?

Another issue that I am having is that I can target all the relevant people on the page and loop through them but I cannot extract the link of each of the profiles. It only takes the first link of the first person and never updates the variable again through the loop.

For example, if the first person is Ian, and the second is Brian, it gives me the link for Ian's profile even if 'users' is Brian.

Debugging the loop I can see the correct list of people in all_users but it only gets the href of the first person in the list and never updates.

Here is the code of that:

all_users = driver.find_elements(By.XPATH, "//*[contains(@class, 'entity-result__title-line entity-result__title-line--2-lines')]")

for users in all_users:
    print(users)
    get_links = users.find_element(By.XPATH, "//*[contains(@href, 'miniProfileUrn')]")
    print(get_links.get_attribute('href'))
  • I see this button is inside of a list. Is it visible without clicking a dropdown or mousing over something first? If not, you may need to mouse over it first or click the dropdown first. – Brandon Johnson Feb 18 '23 at 19:53
  • the list is visible on the page without having to click on a dropdown or mouse over anything. – Jesper Ezra Feb 18 '23 at 20:29
  • I'm trying to find this HTML on various pages on linkedin.com but I can't seem to find the page you're talking about. Please link the actual page and clearly indicate which link you are trying to click. I have the solution to the second problem but don't want to start an answer until I have all the info I need. – JeffC Feb 19 '23 at 06:06
  • @JeffC I updated the question with the link of the actual pages and hopefully a better explanation of what I want to click. – Jesper Ezra Feb 19 '23 at 18:12

4 Answers4

1

If you want to locate several elements with the same attribute replace find_element with find_elements. See if that works to find not just the first element matching your search, but all elements with that attribute.

Review the Selenium: Locating Elements documentation and see if you can try each and every option they have for locating elements.

Something else to try:

button_element = driver.find_element(By.XPATH, "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")
list_element.find_element(By.TAG_NAME, "button").click()
1

I have also tried to do By.XPATH "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")"" but it also does not find it.

The data-target-section-id that you mention is not the same as the one that the button has (PTFmMNSPSz2LQRzwynhRBQ==). Check that this is not dynamic before targeting it.

Your xPath is not bad but as I told you, fix the target-id:

driver.findElement(By.xpath("//button[@data-target-section-id='PTFmMNSPSz2LQRzwynhRBQ==']")).click()

Where "driver" is your WebDriver instance.

Juan Melnechuk
  • 461
  • 1
  • 8
  • what do you mean it is not the same? Could you further explain what you mean by this? – Jesper Ezra Feb 18 '23 at 20:28
  • Your xPath is targeting data-target-section-id='RIK0XK7NRnS21bVSiNaicw==' and in your code, the xPath is data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" – Juan Melnechuk Feb 18 '23 at 20:30
  • oh I see now what you meant. I copied it directly from the page and it still does not find it through. I suspect it's dynamic as you mentioned but even if I add a wait time and change the xpath to wait for it like someone else mentioned it does not find it. – Jesper Ezra Feb 18 '23 at 21:21
  • Use [xPath Finder](https://chrome.google.com/webstore/detail/xpath-finder/ihnknokegkbpmofmafnkoadfjkhlogph) and grab the xPath regardless of the button values. – Juan Melnechuk Feb 18 '23 at 21:24
  • xPath Finder did not help find the xPath either – Jesper Ezra Feb 18 '23 at 23:19
  • Please provide the HTML code. We cannot give you any solution in any other way. Another solution is to take a non-dynamic parent element and look for buttons inside that element. – Juan Melnechuk Feb 18 '23 at 23:32
0

Given the HTML:

<li>
   <button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People </button>
</li>

The data-target-section-id attribute values like PTFmMNSPSz2LQRzwynhRBQ== are dynamically generated and is bound to chage sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.


Solution

The desired element being a dynamic element to click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.search-navigation-panel_button[data-target-section-id]"))).click()
    
  • Using XPATH:

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='search-navigation-panel_button' and @data-target-section-id][contains(., 'People')]"))).click()
    
  • Note: You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • It times out. I tried increasing the time it waits to a full minute but it continues to timeout with both CSS_SELECTOR and XPATH options – Jesper Ezra Feb 18 '23 at 21:19
  • Does the HTML in my answer matches to the actual html at your end? – undetected Selenium Feb 18 '23 at 21:20
  • Looking it more closely it looks like the class didn't match correctly, however, I do not know how to make it match because of the format. The class is as follows: class="search-navigation-panel__button search-navigation-panel__button--active" How can I account for the 2nd line in the class? – Jesper Ezra Feb 18 '23 at 23:18
  • the comment does not correctly reflect the gap. It's class="search-navigation-panel__button \n search-navigation-panel__button--active" – Jesper Ezra Feb 18 '23 at 23:22
0

It looks like the reason your People button locator isn't working is because the data-target-section-id is dynamic. Mine is showing as hopW8RkwTN2R9dPgL6Fm/w==. We can get around that by using an XPath to locate the element based on the text contained, "People", e.g.

//button[text()='People']

Turns out that matches two elements on the page because many of the left nav links are repeated as rounded buttons on the top of the page so we can further refine our locator to

//button[text()='People'][@data-target-section-id]

Having said that, that link only scrolls the page so you don't really need to click that.

From there, you want to get the links to each person listed under the People heading. We first need the DIV that contains the People section. It's kinda messy because the IDs on those elements are also dynamic so we need to find the H2 that contains "People" and then work our way back up the DOM to the DIV that contains only that section. We can get that using the XPath below

//div[@class='search-results-container']/div[.//h2[text()='People']]

From there, we want all of the A tags that uniquely link to a person... and there's a lot of A tags in that section but most are not ones we want so we need to do more filtering. I found that the below XPath locates each unique URL in that section.

//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]

Combining the two XPaths, we get

//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]

which locates all unique URLs belonging to a person in the People section of the page.

Using this, your code would look like

all_users = driver.find_elements(By.XPATH, "//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]")

for user in all_users:
    print(user.get_attribute('href'))

NOTE: The reason your code was only returning the first href repeatedly is because you are searching from an existing element with an XPath so you need to add a "." at the start of the XPath to indicate to start searching from the referenced element.

get_links = users.find_element(By.XPATH, ".//*[contains(@href, 'miniProfileUrn')]")
                                          ^ add period here

I've eliminated that step in my code so you won't need it there.

JeffC
  • 22,180
  • 5
  • 32
  • 55