How to get a url from a html code with re.finall in python

Question

This is my html -

<html>
<body>
<img src="https://example.com/staks.jpg">
<a href="https://example.com">Link1</a>
<a href="https://example.com/page2">Link2</a>
</body>
</html>

I want to get url of Link1 with python as a variable -

import requests
import re

r = requests.get("https://myUrlExample.com")
s = r.text
pattern = re.compile("""href="https://""") # i don't know what pattern should i put
matches = re.findall(pattern, s)
if len(matches) > 0:
  print(matches[0])

`https:\/\/.*.com` this regex could work, it search for everything from `https:` till `.com` — 3dSpatialUser, Jan 21 '22 at 10:58
`href=https:\/\/.*.com` maybe? That would work in your example — 3dSpatialUser, Jan 21 '22 at 11:01
Use a dedicated tool like [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). For using Regex to parse HTML read https://stackoverflow.com/a/1732454/4046632 — buran, Jan 21 '22 at 11:05

score 0 · Accepted Answer · answered Feb 17 '22 at 10:33

0

Actaually i found this using github copilot -

s = r.text
patter = r'href="https://(.*?)"'
url = re.findall(pattern, s)

answered Feb 17 '22 at 10:33

Kushagra

31
7

How to get a url from a html code with re.finall in python

1 Answers1