2

I am currently working on a project that involves splitting a url. I have used the urlparse module to break up the url, so now I am working with just the path segment.

The problem is that when I try to split() the string based on the delimiter "/" to separate the directories, I end up with empty strings in my list.

For example, when I do the following:

import urlparse
url = "http://example/url/being/used/to/show/problem"
parsed = urlparse.urlparse(url)
path = parsed[2] #this is the path element

pathlist = path.split("/")

I get the list:

['', 'url', 'being', 'used', 'to', 'show', 'problem']

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

chindes
  • 61
  • 2
  • 10

5 Answers5

5

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

What? There's only one empty string and it's always first, by definition.

pathlist = path.split("/")[1:] 

Is pretty common.


A trailing slash can mean an "empty" filename. In which case, a default name may be implied (index.html, for example)

It may be meaningful.

"http://example/url/being/used/to/show/problem"

The filename is "problem"

"http://example/url/being/used/to/show/problem/"

The directory is "problem" and a default filename is implied by the empty string.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • If the url has a slash at the end, there is another empty string. – chindes Jul 12 '11 at 18:54
  • Then maybe use a list comp? `path_list = [(p) for p in path.split('/') if len(p)]` – craigs Jul 12 '11 at 19:38
  • @craigs: It's not arbitrary. It's the first position only. The last position may be meaningful. Simply suppressing path elements is wrong. – S.Lott Jul 12 '11 at 19:46
  • @S.Lott: I _completely_ agree with your original response and **do** understand the significance of trailing slashes for most web servers; but I was responding to @chindes later response that indicated for their particular situation they wanted to suppress _all_ empty strings in the split. So…would the only safe way to decide whether or not to suppress the trailing '/' be to actually issue a HEAD request and check for a redirect? p.s. ['I almost wet myself'](http://slott-softwarearchitect.blogspot.com/2011/07/i-almost-wet-myself.html) when I got a response from S.Lott. – craigs Jul 12 '11 at 21:16
  • @craigs: "they wanted to suppress all empty strings in the split" is a Really Bad Idea. It's an Attractive Nuisance. – S.Lott Jul 14 '11 at 10:50
3

I am not familiar with urllib and its output for path but think that one way to form new list you can use list comprehension the following way:

[x for x in path.split("/") if x]

Or something like this if only leading '/':

path.lstrip('/').split("/")

Else if trailing too:

path.strip('/').split("/")

And at least if your string in path always starting from single '/' than the easiest way is:

path[1:].split('/')
Artsiom Rudzenka
  • 27,895
  • 4
  • 34
  • 52
2
pathlist = paths.strip('/').split("/")
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
1

remove the empty items?

pathlist.remove('')
Ilia Choly
  • 18,070
  • 14
  • 92
  • 160
1

I added this as a comment to a comment, so just in case: Couldn't you use a list comprehension to exclude the empty elements returned from the split, i.e.

path_list = [(p) for p in path.split('/') if len(p)]
craigs
  • 123
  • 5