0
import requests as rs
from bs4 import BeautifulSoup as bs
import re

site = 'https://www.iciciprulife.com/'
req = rs.get(site)
soup = bs(req.text, 'html.parser')
link=input("Enter which url you want http or https:")

if link == "http":
    for i in soup.find_all('a',attrs={'href': re.compile("^http://")}):
        print(i.get('href'))

In The above code I don't want to use 'href' or 'a' instead I want to search URL using regular expression in entire webpage

Sagar Jain
  • 11
  • 2

2 Answers2

0

soup.text turns soup to string. This string contains non-ASCII characters, so you need to convert/remove them first.

Then, you can search the whole string with regex.

To remove non-ASCII characters from string:

How to remove nonAscii characters in python

pullidea-dev
  • 1,768
  • 1
  • 7
  • 23
0
urls = re.findall(r'https?://[^\s<>"]+', req.text)
yf879
  • 168
  • 1
  • 7