Splitting a url into a list in python

Question

I am currently working on a project that involves splitting a url. I have used the urlparse module to break up the url, so now I am working with just the path segment.

The problem is that when I try to split() the string based on the delimiter "/" to separate the directories, I end up with empty strings in my list.

For example, when I do the following:

import urlparse
url = "http://example/url/being/used/to/show/problem"
parsed = urlparse.urlparse(url)
path = parsed[2] #this is the path element

pathlist = path.split("/")

I get the list:

['', 'url', 'being', 'used', 'to', 'show', 'problem']

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

S.Lott · Answer 1 · 2011-07-12T19:02:12.667

5

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

What? There's only one empty string and it's always first, by definition.

pathlist = path.split("/")[1:]

Is pretty common.

A trailing slash can mean an "empty" filename. In which case, a default name may be implied (index.html, for example)

It may be meaningful.

"http://example/url/being/used/to/show/problem"

The filename is "problem"

"http://example/url/being/used/to/show/problem/"

The directory is "problem" and a default filename is implied by the empty string.

edited Jul 12 '11 at 19:02

answered Jul 12 '11 at 18:50

S.Lott

384,516
81
508
779

If the url has a slash at the end, there is another empty string. – chindes Jul 12 '11 at 18:54
Then maybe use a list comp? `path_list = [(p) for p in path.split('/') if len(p)]` – craigs Jul 12 '11 at 19:38
@craigs: It's not arbitrary. It's the first position only. The last position may be meaningful. Simply suppressing path elements is wrong. – S.Lott Jul 12 '11 at 19:46
@S.Lott: I _completely_ agree with your original response and **do** understand the significance of trailing slashes for most web servers; but I was responding to @chindes later response that indicated for their particular situation they wanted to suppress _all_ empty strings in the split. So…would the only safe way to decide whether or not to suppress the trailing '/' be to actually issue a HEAD request and check for a redirect? p.s. ['I almost wet myself'](http://slott-softwarearchitect.blogspot.com/2011/07/i-almost-wet-myself.html) when I got a response from S.Lott. – craigs Jul 12 '11 at 21:16
@craigs: "they wanted to suppress all empty strings in the split" is a Really Bad Idea. It's an Attractive Nuisance. – S.Lott Jul 14 '11 at 10:50

Artsiom Rudzenka · Accepted Answer · 2011-07-12T19:06:35.813

3

I am not familiar with urllib and its output for path but think that one way to form new list you can use list comprehension the following way:

[x for x in path.split("/") if x]

Or something like this if only leading '/':

path.lstrip('/').split("/")

Else if trailing too:

path.strip('/').split("/")

And at least if your string in path always starting from single '/' than the easiest way is:

path[1:].split('/')

edited Jul 12 '11 at 19:06

answered Jul 12 '11 at 18:49

Artsiom Rudzenka

27,895
4
34
52

score 2 · Answer 3 · answered Jul 12 '11 at 18:56

2

pathlist = paths.strip('/').split("/")

answered Jul 12 '11 at 18:56

Jochen Ritzel

104,512
31
200
194

score 1 · Answer 4 · answered Jul 12 '11 at 18:50

1

remove the empty items?

pathlist.remove('')

answered Jul 12 '11 at 18:50

Ilia Choly

18,070
14
92
160

score 1 · Answer 5 · answered Jul 12 '11 at 19:42

1

I added this as a comment to a comment, so just in case: Couldn't you use a list comprehension to exclude the empty elements returned from the split, i.e.

path_list = [(p) for p in path.split('/') if len(p)]

answered Jul 12 '11 at 19:42

craigs

123
5

Splitting a url into a list in python

5 Answers5