4

I want to find the broken links on my web page by using Selenium + Python. I tried the above code but it shows me the following error:

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

Code trials:

for link in links:

    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

Full code:

def test_lsearch(self):
    driver=self.driver
    driver.get("http://www.google.com")
    driver.set_page_load_timeout(10)
    driver.find_element_by_name("q").send_keys("selenium")

    driver.set_page_load_timeout(10)
    el=driver.find_element_by_name("btnK")
    el.click()
    time.sleep(5)

    links=driver.find_elements_by_css_selector("a")
    for link in links:
        r=requests.head(link.get_attribute('href'))
        print(link.get_attribute('href'),r.status_code)
Ruan
  • 219
  • 5
  • 9
Talib
  • 83
  • 1
  • 7

2 Answers2

2

This error message...

    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

...implies that the Support for unicode domain names and paths failed within the collected href attribute.

This error is defined in models.py as follows:

    # Support for unicode domain names and paths.
    scheme, auth, host, port, path, query, fragment = parse_url(url)
    if not scheme:
        raise MissingSchema("Invalid URL {0!r}: No schema supplied. "
                            "Perhaps you meant http://{0}?".format(url))

Solution

Possibly you are trying to look for the broken links once the search results are available for the keyword selenium on Google Home Page Search Box. To achieve that you can use the following solution:

  • Code Block:

    import requests
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys 
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get('https://google.co.in/')
    search = driver.find_element_by_name('q')
    search.send_keys("selenium")
    search.send_keys(Keys.RETURN)
    links = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//div[@class='rc']//h3//ancestor::a[1]")))
    print("Number of links : %s" %len(links))
    for link in links:
        r = requests.head(link.get_attribute('href'))
        print(link.get_attribute('href'), r.status_code)
    
  • Console Output:

    Number of links : 9
    https://www.seleniumhq.org/ 200
    https://www.seleniumhq.org/download/ 200
    https://www.seleniumhq.org/docs/01_introducing_selenium.jsp 200
    https://www.guru99.com/selenium-tutorial.html 200
    https://en.wikipedia.org/wiki/Selenium_(software) 200
    https://github.com/SeleniumHQ 200
    https://www.edureka.co/blog/what-is-selenium/ 200
    https://seleniumhq.github.io/selenium/docs/api/py/ 200
    https://seleniumhq.github.io/docs/ 200
    

Update

As per your counter question, it would be a bit tough to canonically answer why xpath worked but not tagName from Selenium perspective. Perhaps you may like to dig deeper into these discussions for the same:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • When I use your code by finding the element by TAG_NAME it shows me the same error but for XPATH it works. Why is this so? – Talib Jan 24 '19 at 13:38
  • @Talib Checkout my answer update and let me know if any questions – undetected Selenium Jan 24 '19 at 14:02
  • Can you help me in this question: https://stackoverflow.com/questions/54347439/capture-screenshot-of-failed-test-cases/54347657?noredirect=1#comment95512505_54347657 – Talib Jan 24 '19 at 14:15
0

Try this, I pretty sure there could be better ways to accomplish this and this may or may not solve your problem, In the shore time I'd, I came up this approach and it seems to be working for me

import itertools
import requests
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys

driver = Chrome()
driver.get('https://www.google.com/')

# Search 'selenium'
search = driver.find_element_by_css_selector('input[aria-label="Search"]')
search.send_keys('selenium')
search.send_keys(Keys.ENTER)

# Resuls div
container = driver.find_element_by_id('rso')
results = container.find_elements_by_css_selector('.bkWMgd')
del results[1]

# links
_links = []
for result in results:
    _links.append([r.get_attribute('href') for r in result.find_elements_by_css_selector('.r>a')])

driver.quit()
links = list(itertools.chain.from_iterable(_links))

for link in links:
    r = requests.get(link)
    print(link, r.status_code)

output

https://www.seleniumhq.org/ 200
https://www.seleniumhq.org/projects/webdriver/ 200
https://www.webmd.com/a-to-z-guides/supplement-guide-selenium 200
https://www.healthline.com/nutrition/selenium-benefits 200
https://github.com/SeleniumHQ/selenium 200
https://en.wikipedia.org/wiki/Selenium_(software) 200
https://www.medicalnewstoday.com/articles/287842.php 200
https://ods.od.nih.gov/factsheets/Selenium-Consumer/ 200
https://selenium-python.readthedocs.io/ 200
https://selenium-python.readthedocs.io/installation.html 200
Satish
  • 1,976
  • 1
  • 15
  • 19