-1

I am trying to retrieve a hyperlink from a string. I am converting multile docx files to a dataframe with a column for hyperlinks. I did obtain a columnn with hyperlinks in it, but is also contains other text. Each row has text that looks as follows:

text ="<li> a lot of text a lot of text a lot of text a lot of text <a href=""https://www.google.com"">google.com</a>)</li>"

While text is a string. How can I retrieve the hyperlink from this string easily?

Tobias
  • 137
  • 10
  • You can use a [regular expression](https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url) – ForceBru Jul 07 '21 at 10:19

1 Answers1

1

You can retrieve the URL using regular expression.

import re
re.search("(?P<url>https?://[^\s]+)", yourString).group("url")

If URL contains more than one URL

re.findall(r'(https?://[^\s]+)', yourString)
Jagdeesh
  • 119
  • 2
  • 14