How to find url without using href in the below code

Question

import requests as rs
from bs4 import BeautifulSoup as bs
import re

site = 'https://www.iciciprulife.com/'
req = rs.get(site)
soup = bs(req.text, 'html.parser')
link=input("Enter which url you want http or https:")

if link == "http":
    for i in soup.find_all('a',attrs={'href': re.compile("^http://")}):
        print(i.get('href'))

In The above code I don't want to use 'href' or 'a' instead I want to search URL using regular expression in entire webpage

You should say **why** you don't want to use href? Using your own regex to parse html is generally considered a bad idea... — tomjn, Jun 09 '21 at 09:25

score 0 · Answer 1 · answered Jun 09 '21 at 09:35

soup.text turns soup to string. This string contains non-ASCII characters, so you need to convert/remove them first.

Then, you can search the whole string with regex.

To remove non-ASCII characters from string:

How to remove nonAscii characters in python

yf879 · Answer 2 · 2021-06-09T11:14:23.210

0

urls = re.findall(r'https?://[^\s<>"]+', req.text)

edited Jun 09 '21 at 11:14

answered Jun 09 '21 at 11:04

yf879

168
1
7

How to find url without using href in the below code

2 Answers2