Two issues here.
Path splitting
You'd normally use os.path.split
to work with paths:
>>> import os.path
>>> p=r'C:\Users\xyz\filename.txt'
>>> head, tail = os.path.split(p)
>>> head
'C:\\Users\\xyz'
>>> tail
'filename.txt'
Caveat: os.path
works with the path format of the operating system it's used on. If you know you specifically want to work with Windows paths (even when your program is ran on Linux or OSX), then instead of the os.path
you'd work with the ntpath
module. See the note:
Note Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
- posixpath for UNIX-style paths
- ntpath for Windows paths
- macpath for old-style MacOS paths
- os2emxpath for OS/2 EMX paths
Format support
You have 2 formats to support:
- file://C:\Users\xyz\filename.txt
- C:\Users\xyz\filename.txt
2 is a normal Windows path, and 1 is... Frankly, I have no idea what that is. It kind of looks like a file URI, but uses Windows-style delimiters (backslashes). This is strange. When I open a PDF in Chrome on Windows the URI looks different:
file:///C:/Users/kos/Downloads/something.pdf
and I'll assume that's the format you're interested in. If not, then I can't vouch for what you're dealing with and you can make some educated guess on how to interpret it (drop the file://
prefix and treat it as a Windows path?).
An URI you can split into meaningful parts using the urlparse
module (see urllib.parse
for python 3), and once you've extracted the path part of the URI, you can just .split('/')
it (URI grammar is simple enough to allow that). Here's what happens if you use this module on a file://
URI:
>>> r = urlparse.urlparse(r'file:///C:/Users/xyz/filename.txt')
>>> r
ParseResult(scheme='file', netloc='', path='/C:/Users/xyz/filename.txt', params='', query='', fragment='')
>>> r.path
'/C:/Users/xyz/filename.txt'
>>> r.path.lstrip('/').split('/')
['C:', 'Users', 'xyz', 'filename.txt']
Please read this URI scheme description to have a better idea how this format looks like and why there are three slashes after file:
.