1

I have a Python code that scraps different data. For example, it scraps the Website from this HTML code:

<a data-ix="show-popup-on-click" target="_blank" rel="nofollow" href="https://mylink.org/" class="button full w-button" style="transition: all 0.4s ease 0s;">Website</a>

It was working properly, but now it fails with the error:

NoSuchElementException: Message: {"errorMessage":"Unable to find element with link text 'Website'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"95","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:40581","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"link text\", \"sessionId\": \"a7a441f0-0f6a-11e8-ad3a-6121f74a30f4\", \"value\": \"Website\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7a441f0-0f6a-11e8-ad3a-6121f74a30f4/element"}} Screenshot: available via screen

This is my code:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(link)
driver.implicitly_wait(10)

website = driver.find_element_by_link_text("Website").get_attribute("href")

What am I doing wrong?

UPDATE:

<div class="column-space w-col w-col-4">
   <a data-ix="show-popup-on-click" target="_blank" 
      rel="nofollow" href="https://example.com/" 
      class="button full w-button" 
      style="transition: all 0.4s ease 0s;">Website</a>

   <div class="space big"></div>
   <a target="_blank" rel="nofollow" 
      href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf" 
      class="button-2 w-button">Whitepaper</a>
   <div class="space big"></div>
   <a class="button-2 w-condition-invisible w-button">Program</a>
   <div class="space big w-condition-invisible"></div>
   <div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Token:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">UTC</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Price:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">1 LUC=0,05 USD</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Buy with:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USD, EUR</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Platform:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">MyPlatform</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix w-condition-invisible">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">No</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Yes</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Location:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Malta</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Can't join:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USA</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">January 25, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 5, 2018</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 12, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">March 5, 2018</div>
         </div>
      </div>
      <div>
         <div class="div-block-33">
            <div class="space big"></div>
            <div>
               <a target="_blank" rel="nofollow" 
               class="button green full w-condition-invisible w-button">JOIN WHITELIST NOW »</a>
               <div class="div-block-34">
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com" 
                     class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO Slack link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO Telegram link">
                  </a>
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="Talk link">
                  </a>
               </div>
            </div>
         </div>
      </div>
   </div>
</div>
Markus
  • 3,562
  • 12
  • 48
  • 85

2 Answers2

1

This error occurs when Selenium can't find the object in the HTML DOM.

My guess is that you set up your implicit wait too late, and Selenium tries to get the Element before the page is loaded and the element present in the HTML DOM.

driver.get(link)
driver.implicitly_wait(10)

The documentation sets up the implicit wait before getting any pages:

driver = webdriver.PhantomJS()
driver.implicitly_wait(10)
driver.get(link)

This ensures that selenium waits until the page is fully loaded before it looks for the anchor tag element.

DocLink: http://selenium-python.readthedocs.io/waits.html#implicit-waits

Also if there are no elements on that page you are scraping that are loaded or created via javascript, then you don't need selenium to do simple text extraction scraping. You could just use the core library urllib.request to get the page and then scrape with beautifulSoup.

UPDATE:

As Ian in said in the comments, implicit wait positioning doesn't matter in this case.

The Problem was the Locator Strategy.

website = driver.find_element_by_link_text('Website').get_attribute('href')

In this case it couldn't find the element, which is a Link styled to a button with uppercase lettering WEBSITE. It seems to match not the link text in the HTML DOM ("Website") but the css computed style rendered text WEBSITE on the button.

Another locator strategy like css-selector or XPATH seems to me to deliver more reliable results:

driver.find_element_by_xpath("//a[contains(text(),'Website')]").get_attribute("href")

Some more information on those can be found here: Selenium Locating Elements

Nidus
  • 56
  • 4
  • 1
    It doesn't matter when `implicitly_wait()` is called so long as it happens before the `find_*` that must wait. – Ian Lesperance Feb 11 '18 at 21:43
  • Like Ian said, my guess is wrong as the implicit wait statement positioning doesn't matter in this case. It looks like we need more information to help. – Nidus Feb 12 '18 at 03:11
  • I am extracting "Website" from this link https://topicolist.com/ico/adhive. Could you please check it? – Markus Feb 12 '18 at 19:41
  • Okay, thanks for posting the link. I did some tests and got it working. It looks like it's because of your Locator Strategy. `driver.find_element_by_link_text("WEBSITE")` seems to work, where "Website" doesnt. It looks like find_element_by_link_text doesn't use the Text in the HTML DOM to match, but the computed style rendered text. – Nidus Feb 12 '18 at 20:04
1

There is no problem in the code , on inspecting the Websitelink from web page i can see the text as "Website" but if i use the same text to find the element by link text like below i am getting NoSuchElementException

website = driver.find_element_by_link_text("Website").get_attribute("href")
print(website)

I have tried giving 'waits' and used partial_link_text also but no luck.

Then i tried fetching all the element of tag name "a" and print the text from those with the below code.

elements = driver.find_elements_by_tag_name("a")
for element in elements:
    print(element.text)

Later i got to know its not the "Website" its "WEBSITE". But i am not sure why its behaving like this.

After changing the all characters od website to capital i am able to identify the element and fetch the href from that.

driver.get("https://topicolist.com/ico/adhive")
website = driver.find_element_by_link_text("WEBSITE").get_attribute("href")
print(website)

Hope its solves your problem.

Pradeep hebbar
  • 2,147
  • 1
  • 9
  • 14