3

It seems to be os.sep returns "/" as a separator, I wonder if I can use that to make a URL: eg. To get a url like https://some.domain.com/catalogs

   protocol + request.get_host() + os.sep + get_file_name() 
  • Can this cause any problem? Or
  • Is there anything in urllib/2 to join uris?
trex
  • 3,848
  • 4
  • 31
  • 54
  • `os.sep` is OS specific, on Windows it will return `'\'`. – Ashwini Chaudhary Jul 23 '14 at 10:31
  • 1
    Why not just use `"/"`? The URL standard says to use a slash, so use a slash. os.sep is for dealing with differences among operating systems, but URLs don't have differences like this. – Ned Batchelder Jul 23 '14 at 11:17
  • 1
    Related http://stackoverflow.com/questions/1793261/how-to-join-components-of-a-path-when-you-are-constructing-a-url-in-python – Kos Jul 23 '14 at 12:29

3 Answers3

4

os.sep will return \ on Windows - whether that's what you want depends on the protocol you're using I think, but broadly speaking using I think os.sep isn't appropriate for URLs that aren't using file:// (and even then it's questionable).

You might find urlparse useful: https://docs.python.org/2/library/urlparse.html

Tom Dalton
  • 6,122
  • 24
  • 35
  • 3
    Uniform resource identifier always uses '/', (Specified in the RFC's), and more or less everything network related uses URI, so in practice, _never_ use `os.sep`, unless you are working with filepaths on your system. – brunsgaard Jul 23 '14 at 10:45
3

From documentation:

The character used by the operating system to separate pathname components. This is '/' for POSIX and '\' for Windows. Note that knowing this is not sufficient to be able to parse or concatenate pathnames — use os.path.split() and os.path.join() — but it is occasionally useful. Also available via os.path.

So No, it is note safe to use.

For URI parsing, splitting, joining, etc, you should use the urllib.parse library. (called urlparse in python 2)

brunsgaard
  • 5,066
  • 2
  • 16
  • 15
3

os.sep gives you the separator for your current system's file system paths. Your system paths and URI paths aren't the same.

RFC 3986 gives:

A path consists of a sequence of path segments separated by a slash ("/") character.

If you have an URI like http://foo.bar.baz/a/b/c/d, you should use urlsplit to split it into components and extract the path part. Then you can safely use .split('/') to get the individual parts of this path, or use '/'.join to construct a path from the segments (if you know that each segment is a valid segment according to the grammar).

The grammar doesn't permit this / to be anything other than a separator in the path segment, check the RFC to be doubly sure. This doesn't hold for the whole URL though, / will mean different things in other URL sections.

The opposite of urlsplit is urlunsplit which can do what you want once you have the path assembled.

To be safe, you should percent-encode the individual path parts before joining them with / using urllib.quote('/test', '') (mind the second parameter - / isn't escaped here by default.)

Community
  • 1
  • 1
Kos
  • 70,399
  • 25
  • 169
  • 233