1

How can I parse URLs from any give plain text (not limited to href attributes in tags)?

Any code examples in Python will be appreciated.

jack
  • 17,261
  • 37
  • 100
  • 125
  • See near duplicate: http://stackoverflow.com/questions/520031/whats-the-cleanest-way-to-extract-urls-from-a-string-using-python – mjv Apr 29 '10 at 06:53

2 Answers2

2

You could use a Regular Expression to parse the string.

Look in this previously asked question: What’s the cleanest way to extract URLs from a string using Python?

Community
  • 1
  • 1
Brock Woolf
  • 46,656
  • 50
  • 121
  • 144
1

See Jan Goyvaerts' blog.

So a Python code example could look like

result = re.findall(r"\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]", subject)
Jasha
  • 5,507
  • 2
  • 33
  • 44
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561