Regex for absolute url

Question

I am searching quite a while for a regex compatible with Python's re module for finding all URLs in HTML document and I cannot find it except one that was only to able to check whether an url is valid or invalid (with match method). I want to do simple

import requests
html_response = requests.get('http://example.com').text
urls = url_pattern.findall(html_response)

I suppose needed regex (if exists) would be complex enough to take into consideration a bunch of special cases of urls so it cannot be some oneline code.

Don't use regex to parse html. Use [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) instead (or the [html parser in the standard lib](https://docs.python.org/3/library/html.parser.html) ) — Chad S., Oct 09 '15 at 21:14

score 4 · Accepted Answer · edited May 23 '17 at 12:14

4

Use BeautifulSoup instead.It's simple to use and allows you to parse pages with HTML.

See this answer How to extract URLs from an HTML page in Python

edited May 23 '17 at 12:14

Community

1
1

answered Oct 09 '15 at 21:15

Anurag Verma

485
2
12

Regex for absolute url

1 Answers1