You can search for "words" containing :
and then pass them to urlparse
(renamed to urllib.parse
in Python 3.0 and newer) to check if they are valid URLs.
Example:
possible_urls = re.findall(r'\S+:\S+', text)
If you want to restrict yourself only to URLs starting with http://
or https://
(or anything else you want to allow) you can also do that with regular expressions, for example:
possible_urls = re.findall(r'https?://\S+', text)
You may also want to use some heuristics to determine where the URL starts and stops because sometimes people add punctuation to the URLs, giving new valid but unintentionally incorrect URLs, for example:
Have you seen the new look for http://example.com/? It's a total ripoff of http://example.org/!
Here the punctuation after the URL is not intended to be part of the URL. You can see from the automatically added links in the above text that StackOverflow implements such heuristics.