1

So, this is my problem that everyone with a Linkedin-Account might help me with: I am trying to select some data from Linkedin-Profilepages. Selecting the Name works fine, with this X-Path:

name = driver.find_element_by_xpath('//section[contains(concat(" ",normalize-space(@class)," ")," pv-top-card-v3 ")][contains(concat(" ",normalize-space(@class)," ")," artdeco-card ")][contains(concat(" ",normalize-space(@class)," ")," ember-view ")]//div/following-sibling::*[1]/self::div//div/following-sibling::*[1]/self::div//div[count(preceding-sibling::div)=0]//ul//li[count(preceding-sibling::li)=0][contains(concat(" ",normalize-space(@class)," ")," break-words ")]')

Same for location and current job.

But then it gets tricky. I am trying to select the last education-station, like the last college. It works fine to select it in the chrome developer console, but selenium is not able to find it with the "no such element" error. In the open window of the selenium chromedriver I am still able to find the element with the query.

My query for that is:

school = driver.find_element_by_xpath('//section[@id="education-section"]//ul//li[count(preceding-sibling::li)=0]//div//div//div//a//div/following-sibling::*[1]/self::div//div//h3[contains(concat(" ",normalize-space(@class)," ")," pv-entity__school-name ")]')

I googled around and the only thing I found was about iFrames. As far as I can see, the element is not wrapped in an iFrame. However there is an js-script in the end, that might have something to do with it as I dont really understand what's happening:

function(){var a=n.MessageChannel;"undefined"===typeof a&&"undefined"!==typeof window&&window.postMessage&&window.addEventListener&&!F("Presto")&&(a=function(){var a=window.document.createElement("IFRAME");a.style.display="none";a.src="";window.document.documentElement.appendChild(a);var b=a.contentWindow,a=b.document;a.open();a.write("");a.close();var c="callImmediate"+Math.random(),d="file:"==b.location.protocol?"*":b.location.protocol+"//"+b.location.host,a=(0,_.y)(function(a){if(("*"==d||a.origin==
d)&&a.data==c)this.port1.onmessage()},this);b.addEventListener("message",a,!1);this.port1={};this.port2={postMessage:function(){b.postMessage(c,d)}}});if("undefined"!==typeof a&&!F("Trident")&&!F("MSIE")){var b=new a,c={},d=c;b.port1.onmessage=function(){if(_.l(c.next)){c=c.next;var a=c.za;c.za=null;a()}};return function(a){d.next={za:a};d=d.next;b.port2.postMessage(0)}}return"undefined"!==typeof window.document&&"onreadystatechange"in window.document.createElement("SCRIPT")?function(a){var b=window.document.createElement("SCRIPT");
b.onreadystatechange=function(){b.onreadystatechange=null;b.parentNode.removeChild(b);b=null;a();a=null};

I really don't know if this has anything to do with it, but it might have. I am seriously out of ideas.

AnanasXpress
  • 111
  • 1
  • 12
  • 1
    Have you seen https://stackoverflow.com/questions/54392465/linkedin-webscrape-w-selenium? – Glazbee Nov 14 '19 at 10:27
  • No, I had not seen it yet but I guess it fixes my issue. Thank you very much. I still dont understand how, as my xpath is correct because I still find the elements through the Chrome console with that xpath. What might be a problem is that linkedin is not rendering the full page when loaded. So I tried it with scrolling down first, but my given xpath still won't work. Thank you anyways, this works for now :) – AnanasXpress Nov 14 '19 at 11:48
  • Did you try using implicit waits? – Metareven Nov 14 '19 at 12:12

1 Answers1

1

So, I found the solution to this one, may it help other people trying to mine data from linkedin. As Linkedin is only partially loading profile pages, the problem is, the element is not visible in the beginning. So I used two steps to achieve, that the page is loaded completely. First of all, I zoomed out, then I scrolled down.

The scroll down comes from this answer: https://stackoverflow.com/a/27760083/11192772

The zoom comes from this answer: https://stackoverflow.com/a/31482681/11192772

So I added this after the page is loaded:

SCROLL_PAUSE_TIME = 1

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
driver.execute_script("document.body.style.zoom='10%'")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, (document.body.scrollHeight/2));")
    # Wait to load page
    sleep(SCROLL_PAUSE_TIME)
    driver.execute_script("window.scrollTo(0, (document.body.scrollHeight));")
    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

It's not working if you are scrolling two far, because then the middlepart is missing. So I just added an extra step to the given solution, by only scrolling two half of the page first.

AnanasXpress
  • 111
  • 1
  • 12