0

There are a number of questions about how to parse a URL in Python, this question is about the best or most Pythonic way to do it.

In my parsing I need 4 parts: the network location, the first part of the URL, the path and the filename and querystring parts.

http://www.somesite.com/base/first/second/third/fourth/foo.html?abc=123

should parse into:

netloc = 'www.somesite.com'
baseURL = 'base'
path = '/first/second/third/fourth/'
file = 'foo.html?abc=123'

The code below produces the correct result, but is there are better way to do this in Python?

url = "http://www.somesite.com/base/first/second/third/fourth/foo.html?abc=123"

file=  url.rpartition('/')[2]
netloc = urlparse(url)[1]
pathParts = path.split('/')
baseURL = pathParts[1]

partCount = len(pathParts) - 1

path = "/"
for i in range(2, partCount):
    path += pathParts[i] + "/"


print 'baseURL= ' + baseURL
print 'path= ' + path
print 'file= ' + file
print 'netloc= ' + netloc
Ben Blank
  • 54,908
  • 28
  • 127
  • 156
Shawn Swaner
  • 10,818
  • 1
  • 21
  • 14
  • Exact Duplicate: http://stackoverflow.com/questions/258746/slicing-url-with-python – S.Lott May 19 '09 at 10:47
  • Not quite the same as 258746, this question had a slightly different goal and the main focus of asking was about the best (Pythonic) way to accomplish the task. – Shawn Swaner May 20 '09 at 08:17

2 Answers2

6

Since your requirements on what parts you want are different from what urlparse gives you, that's as good as it's going to get. You could, however, replace this:

partCount = len(pathParts) - 1

path = "/"
for i in range(2, partCount):
    path += pathParts[i] + "/"

With this:

path = '/'.join(pathParts[2:-1])
Paolo Bergantino
  • 480,997
  • 81
  • 517
  • 436
2

I'd be inclined to start out with urlparse. Also, you can use rsplit, and the maxsplit parameter of split and rsplit to simplify things a bit:

_, netloc, path, _, q, _ = urlparse(url)
_, base, path = path.split('/', 2) # 1st component will always be empty
path, file = path.rsplit('/', 1)
if q: file += '?' + q
Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299