1

I am searching quite a while for a regex compatible with Python's re module for finding all URLs in HTML document and I cannot find it except one that was only to able to check whether an url is valid or invalid (with match method). I want to do simple

import requests
html_response = requests.get('http://example.com').text
urls = url_pattern.findall(html_response)

I suppose needed regex (if exists) would be complex enough to take into consideration a bunch of special cases of urls so it cannot be some oneline code.

Yuras
  • 484
  • 1
  • 3
  • 11
  • 2
    Don't use regex to parse html. Use [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) instead (or the [html parser in the standard lib](https://docs.python.org/3/library/html.parser.html) ) – Chad S. Oct 09 '15 at 21:14

1 Answers1

4

Use BeautifulSoup instead.It's simple to use and allows you to parse pages with HTML.

See this answer How to extract URLs from an HTML page in Python

Community
  • 1
  • 1
Anurag Verma
  • 485
  • 2
  • 12