0

This is my html -

<html>
<body>
<img src="https://example.com/staks.jpg">
<a href="https://example.com">Link1</a>
<a href="https://example.com/page2">Link2</a>
</body>
</html>

I want to get url of Link1 with python as a variable -
import requests
import re

r = requests.get("https://myUrlExample.com")
s = r.text
pattern = re.compile("""href="https://""") # i don't know what pattern should i put
matches = re.findall(pattern, s)
if len(matches) > 0:
  print(matches[0])
Kushagra
  • 31
  • 7
  • `https:\/\/.*.com` this regex could work, it search for everything from `https:` till `.com` – 3dSpatialUser Jan 21 '22 at 10:58
  • `href=https:\/\/.*.com` maybe? That would work in your example – 3dSpatialUser Jan 21 '22 at 11:01
  • Use a dedicated tool like [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). For using Regex to parse HTML read https://stackoverflow.com/a/1732454/4046632 – buran Jan 21 '22 at 11:05

1 Answers1

0

Actaually i found this using github copilot -

s = r.text
patter = r'href="https://(.*?)"'
url = re.findall(pattern, s)
Kushagra
  • 31
  • 7