How to split a web address

Question

So I'm using python to do some parsing of web pages and I want to split the full web address into two parts. Say I have the address http://www.stackoverflow.com/questions/ask. I would need the protocol and domain (e.g. http://www.stackoverflow.com) and the path (e.g. /questions/ask). I figured this might be solved by some regex, however I'm not so handy with that. Any suggestions?

Duplicate. See http://stackoverflow.com/questions/258746/slicing-url-with-python and http://stackoverflow.com/questions/163009/urllib2-file-name — S.Lott, Nov 13 '08 at 10:57

score 13 · Answer 1 · edited Apr 14 '18 at 00:54

13

Dan is right: urlparse is your friend:

>>> from urlparse import urlparse
>>>
>>> parts = urlparse("http://www.stackoverflow.com/questions/ask")
>>> parts.scheme + "://" + parts.netloc
'http://www.stackoverflow.com'
>>> parts.path
'/questions/ask'

Note: In Python 3 it's from urllib.parse import urlparse

edited Apr 14 '18 at 00:54

Paulo Almeida

7,803
28
36

answered Nov 13 '08 at 03:37

Ned Batchelder

364,293
75
561
662

Gotta love that batteries included philosophy. I thought regex at first b/c I didn't know about that battery was included. Thanks. – Sam Corder Nov 13 '08 at 18:22

score 7 · Answer 2 · edited Nov 03 '14 at 20:39

7

Use the Python urlparse module:

https://docs.python.org/library/urlparse.html

For a well-defined and well-traveled problem like this, don't bother with writing your own code, let alone your own regular expressions. They cause too much trouble ;-).

edited Nov 03 '14 at 20:39

twasbrillig

17,084
9
43
67

answered Nov 13 '08 at 03:13

Dan Fego

13,644
6
48
59

score -1 · Answer 3 · answered Nov 13 '08 at 03:12

-1

import re
url = "http://stackoverflow.com/questions/ask"
protocol, domain = re.match(r"(http://[^/]*)(.*)", url).groups()

answered Nov 13 '08 at 03:12

Cybis

9,773
2
36
37

How to split a web address

3 Answers3

Linked