requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied while trying to find broken links through Selenium and Python

Question

I want to find the broken links on my web page by using Selenium + Python. I tried the above code but it shows me the following error:

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

Code trials:

for link in links:

    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

Full code:

def test_lsearch(self):
    driver=self.driver
    driver.get("http://www.google.com")
    driver.set_page_load_timeout(10)
    driver.find_element_by_name("q").send_keys("selenium")

    driver.set_page_load_timeout(10)
    el=driver.find_element_by_name("btnK")
    el.click()
    time.sleep(5)

    links=driver.find_elements_by_css_selector("a")
    for link in links:
        r=requests.head(link.get_attribute('href'))
        print(link.get_attribute('href'),r.status_code)

no one will write your code from image to reproduce your problem , add your code as part of question — Dev, Jan 23 '19 at 12:01

undetected Selenium · Answer 1 · 2019-01-24T14:01:36.220

This error message...

    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

...implies that the Support for unicode domain names and paths failed within the collected href attribute.

This error is defined in models.py as follows:

    # Support for unicode domain names and paths.
    scheme, auth, host, port, path, query, fragment = parse_url(url)
    if not scheme:
        raise MissingSchema("Invalid URL {0!r}: No schema supplied. "
                            "Perhaps you meant http://{0}?".format(url))

Solution

Possibly you are trying to look for the broken links once the search results are available for the keyword selenium on Google Home Page Search Box. To achieve that you can use the following solution:

Code Block:

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys 

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://google.co.in/')
search = driver.find_element_by_name('q')
search.send_keys("selenium")
search.send_keys(Keys.RETURN)
links = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//div[@class='rc']//h3//ancestor::a[1]")))
print("Number of links : %s" %len(links))
for link in links:
    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

Console Output:

Number of links : 9
https://www.seleniumhq.org/ 200
https://www.seleniumhq.org/download/ 200
https://www.seleniumhq.org/docs/01_introducing_selenium.jsp 200
https://www.guru99.com/selenium-tutorial.html 200
https://en.wikipedia.org/wiki/Selenium_(software) 200
https://github.com/SeleniumHQ 200
https://www.edureka.co/blog/what-is-selenium/ 200
https://seleniumhq.github.io/selenium/docs/api/py/ 200
https://seleniumhq.github.io/docs/ 200

Update

As per your counter question, it would be a bit tough to canonically answer why xpath worked but not tagName from Selenium perspective. Perhaps you may like to dig deeper into these discussions for the same:

When I use your code by finding the element by TAG_NAME it shows me the same error but for XPATH it works. Why is this so? — Talib, Jan 24 '19 at 13:38
@Talib Checkout my answer update and let me know if any questions — undetected Selenium, Jan 24 '19 at 14:02
Can you help me in this question: https://stackoverflow.com/questions/54347439/capture-screenshot-of-failed-test-cases/54347657?noredirect=1#comment95512505_54347657 — Talib, Jan 24 '19 at 14:15

score 0 · Answer 2 · answered Jan 23 '19 at 12:58

Try this, I pretty sure there could be better ways to accomplish this and this may or may not solve your problem, In the shore time I'd, I came up this approach and it seems to be working for me

import itertools
import requests
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys

driver = Chrome()
driver.get('https://www.google.com/')

# Search 'selenium'
search = driver.find_element_by_css_selector('input[aria-label="Search"]')
search.send_keys('selenium')
search.send_keys(Keys.ENTER)

# Resuls div
container = driver.find_element_by_id('rso')
results = container.find_elements_by_css_selector('.bkWMgd')
del results[1]

# links
_links = []
for result in results:
    _links.append([r.get_attribute('href') for r in result.find_elements_by_css_selector('.r>a')])

driver.quit()
links = list(itertools.chain.from_iterable(_links))

for link in links:
    r = requests.get(link)
    print(link, r.status_code)

output

https://www.seleniumhq.org/ 200
https://www.seleniumhq.org/projects/webdriver/ 200
https://www.webmd.com/a-to-z-guides/supplement-guide-selenium 200
https://www.healthline.com/nutrition/selenium-benefits 200
https://github.com/SeleniumHQ/selenium 200
https://en.wikipedia.org/wiki/Selenium_(software) 200
https://www.medicalnewstoday.com/articles/287842.php 200
https://ods.od.nih.gov/factsheets/Selenium-Consumer/ 200
https://selenium-python.readthedocs.io/ 200
https://selenium-python.readthedocs.io/installation.html 200

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied while trying to find broken links through Selenium and Python

2 Answers2

Solution

Update

Linked

Related