Convert a filename to a file:// URL

Question

In WeasyPrint’s public API I accept filenames (among other types) for the HTML inputs. Any filename that works with the built-in open() should work, but I need to convert it to an URL in the file:// scheme that will later be passed to urllib.urlopen().

(Everything is in URL form internally. I need to have a "base URL" for documents in order to resolve relative URL references with urlparse.urljoin().)

urllib.pathname2url is a start:

Convert the pathname path from the local syntax for a path to the form used in the path component of a URL. This does not produce a complete URL. The return value will already be quoted using the quote() function.

The emphasis is mine, but I do need a complete URL. So far this seems to work:

def path2url(path):
    """Return file:// URL from a filename."""
    path = os.path.abspath(path)
    if isinstance(path, unicode):
        path = path.encode('utf8')
    return 'file:' + urlparse.pathname2url(path)

UTF-8 seems to be recommended by RFC 3987 (IRI). But in this case (the URL is meant for urllib, eventually) maybe I should use sys.getfilesystemencoding()?

However, based on the literature I should prepend not just file: but file:// ... except when I should not: On Windows the results from nturl2path.pathname2url() already start with three slashes.

So the question is: is there a better way to do this and make it cross-platform?

Couldn't you just check for something like `url[0:2] == '///'`, and if false add the two extra slashes? — Some programmer dude, Jul 27 '12 at 12:19
Joachim, maybe that would work. I just don’t know what rules to follow to avoid surprising corner-cases. — Simon Sapin, Jul 27 '12 at 12:21
Hey, your example code uses `urlparse.pathname2url`, which doesn't exist. Did you mean `urllib.pathname2url`? — Marius Gedminas, Dec 10 '15 at 12:44

score 93 · Accepted Answer · answered Aug 09 '15 at 15:43

93

For completeness, in Python 3.4+, you should do:

import pathlib

pathlib.Path(absolute_path_string).as_uri()

answered Aug 09 '15 at 15:43

ToBeReplaced

3,334
2
26
42

5

This module is also on PyPI (for other Python versions) https://pypi.python.org/pypi/pathlib/ – Simon Sapin Aug 12 '15 at 23:24
[pathlib2](https://pypi.org/project/pathlib2/) should now be used for other Python versions – Florent Roques Sep 08 '20 at 14:52
`as_uri()` doesn't work on relative filenames (there are use cases for converting only partial filename to (partial) URL – Berry Tsakala Jul 04 '21 at 10:59

score 33 · Answer 2 · answered Jan 12 '13 at 21:38

33

I'm not sure the docs are rigorous enough to guarantee it, but I think this works in practice:

import urlparse, urllib

def path2url(path):
    return urlparse.urljoin(
      'file:', urllib.pathname2url(path))

answered Jan 12 '13 at 21:38

Dave Abrahams

7,416
5
31
19

3

Tested on Linux, Windows, and OS X and it works fine on all three. – javawizard Jun 11 '13 at 19:06
6

And in py3k this becomes `import urlib.parse as urlparse` and `import urlib.request as urllib` – danodonovan Feb 07 '14 at 11:28
1

you should probably call `os.path.abspath(path)` here. – jfs Dec 19 '14 at 22:14
2

If you use the [six](https://pythonhosted.org/six/) library to assure Python 2 and 3 portability: `return six.moves.urllib_parse.urljoin( "file://", six.moves.urllib.request.pathname2url(path))` – George V. Reilly Jan 04 '15 at 23:30
1

This produces urls that look like `file:///C:/foo%20bar/spam/eggs"` Shouldn't it be `file:///C%3A/foo%20bar/spam/eggs"` with the colon turned into `%3A`? – stib Mar 20 '15 at 14:40
1

@stib: aparently not, not on Windows, where the drive letter colon is to be kept intact. See https://blogs.msdn.microsoft.com/ie/2006/12/06/file-uris-in-windows/ and https://en.wikipedia.org/wiki/File_URI_scheme#Windows_2 – Martijn Pieters Nov 17 '17 at 18:13

score 5 · Answer 3 · edited Oct 30 '16 at 02:27

5

Credit to comment from @danodonovan above.

For Python3, the following code will work:

from urllib.parse import urljoin
from urllib.request import pathname2url

def path2url(path):
    return urljoin('file:', pathname2url(path))

edited Oct 30 '16 at 02:27

Antoine

3,880
2
26
44

answered Jun 08 '15 at 06:14

kevinarpe

20,319
26
127
154

score 1 · Answer 4 · edited Jul 27 '12 at 12:59

1

Does the following work for you?

from urlparse import urlparse, urlunparse

urlunparse(urlparse('yourURL')._replace(scheme='file'))

edited Jul 27 '12 at 12:59

Simon Sapin

9,790
3
35
44

answered Jul 27 '12 at 12:48

Jon Clements

138,671
33
247
280

The idea is interesting, but I don’t know if this is enough. In particular, `\` in Windows filenames is supposed to become `/`. Still on Windows, The C in `C:\foo\bar.html` is parsed as a scheme and then replaced. The expected output would be `file:///C:/foo/bar.html`. – Simon Sapin Jul 27 '12 at 12:56

Convert a filename to a file:// URL

4 Answers4

Linked