0

Possible Duplicate:
What is the best regular expression to check if a string is a valid URL

I want to find URLs such as http://www.google.com or http://mail.yahoo.com.uk from a string. What's the best approach to achieve this?

Community
  • 1
  • 1
Don Lun
  • 2,717
  • 6
  • 29
  • 35
  • 1
    Is this a substring search or a validation question? – Paul Sasik Mar 30 '11 at 21:22
  • 1
    You do realize that almost anything is a valid URL? The syntax is very flexible. http://tools.ietf.org/html/rfc3986. The scheme and path components are required, though the path may be empty. So `ftp:` is a legal URL. – S.Lott Mar 30 '11 at 21:44

1 Answers1

1
>>> text = """I want to find url this "http://www.google.com" or "http://mail.yahoo.com.uk" from a string.

I tried different exprs but no one correct. Could anyone help me? Thanks
"""
>>> import re
>>> re.search( '(http://www\\.google\\.com)', text )
<_sre.SRE_Match object at 0x02183060>
>>> _.groups()
('http://www.google.com',)
>>> re.search( '(http://mail\\.yahoo\\.com\\.uk)', text )
<_sre.SRE_Match object at 0x021830A0>
>>> _.groups()
('http://mail.yahoo.com.uk',)
>>> re.findall( '(http://[^"\' ]+)', text )
['http://www.google.com"', 'http://mail.yahoo.com.uk"']

Note that the last example is extremely simplified and should not be used in practice. Google for regular expressions for URLs if you want to do that.

poke
  • 369,085
  • 72
  • 557
  • 602