243

I have a string variable which represents a dos path e.g:

var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

I want to split this string into:

[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]

I have tried using split() and replace() but they either only process the first backslash or they insert hex numbers into the string.

I need to convert this string variable into a raw string somehow so that I can parse it.

What's the best way to do this?

I should also add that the contents of var i.e. the path that I'm trying to parse, is actually the return value of a command line query. It's not path data that I generate myself. Its stored in a file, and the command line tool is not going to escape the backslashes.

BeeBand
  • 10,953
  • 19
  • 61
  • 83
  • 8
    As you review these answers, remember that `os.path.split` is not working for you because you aren't escaping that string properly. – Jed Smith Jul 02 '10 at 19:30
  • You need to escape the string or use rawstring: `r"d:\stuff\morestuff\furtherdown\THEFILE.txt"` to prevent things like `\s` being misinterpreted. – smci Apr 02 '20 at 18:30

23 Answers23

446

I would do

import os
path = os.path.normpath(path)
path.split(os.sep)

First normalize the path string into a proper string for the OS. Then os.sep must be safe to use as a delimiter in string function split.

Seanny123
  • 8,776
  • 13
  • 68
  • 124
Tompa
  • 5,131
  • 2
  • 13
  • 13
  • 44
    As a one-liner, `os.path.normpath(a_path).split(os.path.sep)` – Daniel Farrell Mar 31 '16 at 17:15
  • 3
    This doesn't seem to work for path = root. In that case, the result of path.split is ['','']. In fact in general, this split() solution gives a leftmost directory with empty-string name (which could be replaced by the appropriate slash). The core problem is that a single slash (forward or backward depending on the OS) is the *name* of the root directory, whereas elsewhere in the path it is a *separator*. – gwideman Jul 06 '16 at 02:51
  • 2
    Will it work better with an lstrip then? `os.path.normpath(path).lstrip(os.path.sep).split(os.path.sep)` – Vidar Jul 10 '17 at 22:52
  • Is it just me or does normpath not actually work like this? python 2.7.9, linux: `>>> os.path.normpath(r'\1\2/3') ... '\\1\\2/3'` – flaviut Jul 17 '17 at 19:06
  • If you wanted to golf the one-liner, use `from os import normpath, sep` and `normpath(my_path).split(sep)` – Nathan Smith Sep 07 '17 at 14:43
  • 1
    @user60561 That's because on Linux, backslash is an allowed character in filenames, whereas on Windows a forward slash isn't. That's why on Windows, `normpath` will recognize forward slash as a separator. On Linux, `normpath` will simply assume that you have a directory called `\1\2` and a file or directory inside it called `3`. – Vojislav Stojkovic Apr 30 '18 at 22:03
  • 1
    This actually answers the question, thank you! The accepted answer does not consider cases where one has to fiddle with the folder-structure. Since I had to search a bit, I want to note that (for `path.split(os.sep)`) `os.path.join(*(folder_list))` gets you back the path. – BadAtLaTeX Sep 14 '18 at 11:48
223

I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path, and recommend it on that basis.

(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)

You can get the drive and path+file like this:

drive, path_and_file = os.path.splitdrive(path)

Get the path and the file:

path, file = os.path.split(path_and_file)

Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:

folders = []
while 1:
    path, folder = os.path.split(path)

    if folder != "":
        folders.append(folder)
    elif path != "":
        folders.append(path)

        break

folders.reverse()

(This pops a "\" at the start of folders if the path was originally absolute. You could lose a bit of code if you didn't want that.)

forever
  • 207
  • 2
  • 8
  • @brone - I prefer to use this solution, than having to worry about escaping the backslash. thanks! – BeeBand Jul 13 '10 at 14:43
  • I'll echo your sentiment - os.path should be used any time you're not just writing a one-off. – Wayne Werner Jul 13 '10 at 17:40
  • @brone - I thought that if I selected your answer you would get the bounty?? Sorry, it looks like SO autoselected the answer for the bounty - the points were meant to go to you. – BeeBand Jul 14 '10 at 11:47
  • 1
    I'd be happy to be proved wrong but it seems to me suggested solution does not work if a path such as this "C:\usr\rs0\my0\in111102.log" is used (unless the initial input is a raw string) ? – shearichard Nov 03 '11 at 00:55
  • 1
    It looks like this will not properly split a path if it only contains a directory in OSX such as "/path/to/my/folder/", in order to achieve that you'd want to add these two lines to the beginning: ``if path.endswith("/"):`` and ``path = path[:-1]``. – Kevin London Feb 11 '13 at 19:34
  • @KevinLondon, even with your fix it won't work with things like '/a/b/c//'. – Kan Li Jul 25 '15 at 01:10
  • 1
    I prefer solution by @Tompa – jaycode Nov 26 '15 at 06:13
  • 1
    I concur with [jaycode](https://stackoverflow.com/users/278191/jaycode): [Tompa](https://stackoverflow.com/users/2107536/tompa)'s [solution](https://stackoverflow.com/a/16595356/2809027) is _the_ canonical approach and should have been the accepted answer. This overly complex, inefficient, and error-prone alternative fails to pass muster on production code. There's _no_ reasonable reason to attempt (...and fail, of course) to iteratively parse apart pathnames when simple string splitting succeeds with only a **single line of code.** – Cecil Curry Dec 11 '15 at 04:18
  • The solution by @Tompa does not work for full paths without a drive. It does not give the leading root directory but returns a '.' for it as mentioned by gwideman in his comment on that solution. – Glenn Mackintosh Aug 09 '20 at 05:47
  • This solution doesn't work for relative paths such as `foo/bar`; it loops endlessly. It also doesn't work for paths ending in a trailing slash such as `/foo/bar/`; it doesn't split anything. – marcelm Sep 13 '21 at 12:56
132

In Python >=3.4 this has become much simpler. You can now use pathlib.Path.parts to get all the parts of a path.

Example:

>>> from pathlib import Path
>>> Path('C:/path/to/file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> Path(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')

On a Windows install of Python 3 this will assume that you are working with Windows paths, and on *nix it will assume that you are working with posix paths. This is usually what you want, but if it isn't you can use the classes pathlib.PurePosixPath or pathlib.PureWindowsPath as needed:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> PurePosixPath('/path/to/file.txt').parts
('/', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts
('\\\\host\\share\\', 'path', 'to', 'file.txt')

Edit: There is also a backport to python 2 available: pathlib2

freidrichen
  • 2,237
  • 1
  • 19
  • 23
91

You can simply use the most Pythonic approach (IMHO):

import os

your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
path_list = your_path.split(os.sep)
print path_list

Which will give you:

['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

The clue here is to use os.sep instead of '\\' or '/', as this makes it system independent.

To remove colon from the drive letter (although I don't see any reason why you would want to do that), you can write:

path_list[0] = path_list[0][0]
Maciek D.
  • 2,754
  • 1
  • 20
  • 17
  • 27
    This works `some times`. Other times (on windows at least) you will find paths that look like `folder\folder2\folder3/file.txt`. Its better to first normalize (os.path.normpath) the path and then split that. – vikki Jul 27 '14 at 17:58
  • 8
    **This answer was _almost_ there.** As [vikki](https://stackoverflow.com/users/790439/vikki) suggests, the failure to normalize pathnames before string splitting spells doom on commonplace edge-cases (e.g., `/foo//bar`). See [Tompa](https://stackoverflow.com/users/2107536/tompa)'s [answer](https://stackoverflow.com/users/2107536/tompa) for a more robust solution. – Cecil Curry Dec 11 '15 at 05:14
12

For a somewhat more concise solution, consider the following:

def split_path(p):
    a,b = os.path.split(p)
    return (split_path(a) if len(a) and len(b) else []) + [b]
user1556435
  • 966
  • 1
  • 10
  • 22
11

The problem here starts with how you're creating the string in the first place.

a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

Done this way, Python is trying to special case these: \s, \m, \f, and \T. In your case, \f is being treated as a formfeed (0x0C) while the other backslashes are handled correctly. What you need to do is one of these:

b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"      # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"         # raw string, no doubling necessary

Then once you split either of these, you'll get the result you want.

Craig Trader
  • 15,507
  • 6
  • 37
  • 55
  • @W. Craig Trader - thanks, but this path is not one that I generate myself - it comes back to me from another program and I have to store this data in a variable. I am not sure how to convert data stored in a variable into "raw text". – BeeBand Jul 13 '10 at 14:29
  • There isn't such thing as a "raw text"... it's just how you represent it in the source. Either prepend r"" to the string, or pass it through .replace('\\', '/') – Marco Mariani Jul 13 '10 at 14:37
  • @BeeBand, how are you getting the data back from the other program? Are you reading it from a file, a pipe, a socket? If so, then you don't need to do anything fancy; the only reason for doubling backslashes or using raw strings is to place string constants into Python code. On the other hand, if the other program is generating doubled-backslashes, then you'd want to clean that up before splitting your path. – Craig Trader Jul 13 '10 at 15:04
  • @W. Craig Trader - i'm reading it from a file, that gets written by another program. I couldn't get `split()` or `replace()` to work for some reason - I kept getting hex values. You're right though, I think I was barking up the wrong tree with the raw string idea - I think I was just using `split()` incorrectly. Because I tried some of these solutions using `split()` and they work for me now. – BeeBand Jul 13 '10 at 20:13
5

I can't actually contribute a real answer to this one (as I came here hoping to find one myself), but to me the number of differing approaches and all the caveats mentioned is the surest indicator that Python's os.path module desperately needs this as a built-in function.

antred
  • 3,697
  • 2
  • 29
  • 37
4

The stuff about about mypath.split("\\") would be better expressed as mypath.split(os.sep). sep is the path separator for your particular platform (e.g., \ for Windows, / for Unix, etc.), and the Python build knows which one to use. If you use sep, then your code will be platform agnostic.

Chris
  • 1,421
  • 3
  • 18
  • 31
  • 1
    Or `os.path.split`. You want to be careful with `os.pathsep`, because it's `:` on my version of Python in OS X (and `os.path.split` properly handles `/`). – Jed Smith Jul 02 '10 at 19:26
  • 4
    You mean [`os.sep`](http://docs.python.org/library/os#os.sep), not [`os.pathsep`](http://docs.python.org/library/os#os.pathsep). Follow the wisdom in the `os.sep` docs: _Note that knowing this is not sufficient to be able to parse or concatenate pathnames — use os.path.split() and os.path.join()._ – Jon-Eric Aug 16 '12 at 19:57
3

The functional way, with a generator.

def split(path):
    (drive, head) = os.path.splitdrive(path)
    while (head != os.sep):
        (head, tail) = os.path.split(head)
        yield tail

In action:

>>> print([x for x in split(os.path.normpath('/path/to/filename'))])
['filename', 'to', 'path']
Benny
  • 4,095
  • 1
  • 26
  • 27
3

You can recursively os.path.split the string

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [p]

Testing this against some path strings, and reassembling the path with os.path.join

>>> for path in [
...         r'd:\stuff\morestuff\furtherdown\THEFILE.txt',
...         '/path/to/file.txt',
...         'relative/path/to/file.txt',
...         r'C:\path\to\file.txt',
...         r'\\host\share\path\to\file.txt',
...     ]:
...     print parts(path), os.path.join(*parts(path))
... 
['d:\\', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] d:\stuff\morestuff\furtherdown\THEFILE.txt
['/', 'path', 'to', 'file.txt'] /path\to\file.txt
['', 'relative', 'path', 'to', 'file.txt'] relative\path\to\file.txt
['C:\\', 'path', 'to', 'file.txt'] C:\path\to\file.txt
['\\\\', 'host', 'share', 'path', 'to', 'file.txt'] \\host\share\path\to\file.txt

The first element of the list may need to be treated differently depending on how you want to deal with drive letters, UNC paths and absolute and relative paths. Changing the last [p] to [os.path.splitdrive(p)] forces the issue by splitting the drive letter and directory root out into a tuple.

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [os.path.splitdrive(p)]

[('d:', '\\'), 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
[('', '/'), 'path', 'to', 'file.txt']
[('', ''), 'relative', 'path', 'to', 'file.txt']
[('C:', '\\'), 'path', 'to', 'file.txt']
[('', '\\\\'), 'host', 'share', 'path', 'to', 'file.txt']

Edit: I have realised that this answer is very similar to that given above by user1556435. I'm leaving my answer up as the handling of the drive component of the path is different.

Community
  • 1
  • 1
Mike Robins
  • 1,733
  • 10
  • 14
1

It works for me:

>>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> a.split("\\")
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

Sure you might need to also strip out the colon from the first component, but keeping it makes it possible to re-assemble the path.

The r modifier marks the string literal as "raw"; notice how embedded backslashes are not doubled.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • @unwind - the `r` in front of your string, what does that refer to? – BeeBand Jul 02 '10 at 15:45
  • 2
    r means raw string - it auto-escapes `\ ` characters. It's useful to use whenever you're doing paths. – Wayne Werner Jul 02 '10 at 15:46
  • @Wayne, if the string is passed in as a variable to a function how do I ensure that it is treated as a raw string? – BeeBand Jul 02 '10 at 15:48
  • 1
    @BeeBand: you don't need to care; the r"" is just something that matters during compilation/parsing of the code, it's not something that becomes a property of the string once parsed. It just means "here's a string literal, but don't interpret any backslashes as having any other meaning than being backslashes". – unwind Jul 02 '10 at 16:05
  • 3
    I think it might be helpful to mention you minus well do it more ambiguous using a.split(os.sep) instead of hard coding it? – Tim McJilton Jul 02 '10 at 17:42
  • 5
    I have to downvote you for missing a chance to explain `os.path.split` and `os.pathsep`, considering both of those are far more portable than what you have written. It might not matter to OP now, but it will when he's writing something that needs to move platforms. – Jed Smith Jul 02 '10 at 19:27
1

really easy and simple way to do it:

var.replace('\\', '/').split('/')

  • It may do the work in most cases, but it is not cross-platform and can be **dangerous**. For example, in Linux, the `\\` character is legal in file and directory names. Your code will split such files/directories, which is an undesired result – SomethingSomething Jan 01 '23 at 11:52
1

I use the following as since it uses the os.path.basename function it doesn't add any slashes to the returned list. It also works with any platform's slashes: i.e window's \\\\ or unix's /. And furthermore, it doesn't add the \\\\\\\\ that windows uses for server paths :)

def SplitPath( split_path ):
    pathSplit_lst   = []
    while os.path.basename(split_path):
        pathSplit_lst.append( os.path.basename(split_path) )
        split_path = os.path.dirname(split_path)
    pathSplit_lst.reverse()
    return pathSplit_lst

So for:

\\\\\\\server\\\\folder1\\\\folder2\\\\folder3\\\\folder4

You get:

['server','folder1','folder2','folder3','folder4']
DRPK
  • 2,023
  • 1
  • 14
  • 27
Jay
  • 3,373
  • 6
  • 38
  • 55
  • 1
    That doesn't follow the invariant that passing your result to `os.path.join()` should return the original string. I'd say the correct output for your example input is `[r'\\','server','folder1','folder2','folder3','folder4']`. I.e. what `os.path.split()` does. – Jon-Eric Aug 16 '12 at 20:08
0

re.split() can help a little more then string.split()

import re    
var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
re.split( r'[\\/]', var )
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

If you also want to support Linux and Mac paths, just add filter(None,result), so it will remove the unwanted '' from the split() since their paths starts with '/' or '//'. for example '//mount/...' or '/var/tmp/'

import re    
var = "/var/stuff/morestuff/furtherdown/THEFILE.txt"
result = re.split( r'[\\/]', var )
filter( None, result )
['var', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
Asi
  • 116
  • 1
  • 9
0

I'm not actually sure if this fully answers the question, but I had a fun time writing this little function that keeps a stack, sticks to os.path-based manipulations, and returns the list/stack of items.

def components(path):
    ret = []
    while len(path) > 0:
        path, crust = split(path)
        ret.insert(0, crust)
    return ret
DRPK
  • 2,023
  • 1
  • 14
  • 27
mallyvai
  • 1,698
  • 14
  • 20
0

Just like others explained - your problem stemmed from using \, which is escape character in string literal/constant. OTOH, if you had that file path string from another source (read from file, console or returned by os function) - there wouldn't have been problem splitting on '\\' or r'\'.

And just like others suggested, if you want to use \ in program literal, you have to either duplicate it \\ or the whole literal has to be prefixed by r, like so r'lite\ral' or r"lite\ral" to avoid the parser converting that \ and r to CR (carriage return) character.

There is one more way though - just don't use backslash \ pathnames in your code! Since last century Windows recognizes and works fine with pathnames which use forward slash as directory separator /! Somehow not many people know that.. but it works:

>>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt"
>>> var.split('/')
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

This by the way will make your code work on Unix, Windows and Mac... because all of them do use / as directory separator... even if you don't want to use the predefined constants of module os.

Nas Banov
  • 28,347
  • 6
  • 48
  • 67
  • Unfortunately the data is being returned to me from another program that I run from my python script. I don't have any control over whether to use '\' or '/' - it is the third party program that determines this ( probably on a platform basis ). – BeeBand Jul 13 '10 at 14:33
  • @BeeBand: Ah, then you won't have the problem you experienced during testing, when you provided the string as literal in your program. Or you can do the following evil hack after receiving the path: `var = var.replace('\\','/')` - replace \ with / and proceed working with forward slashes only :) – Nas Banov Jul 13 '10 at 20:30
  • that is indeed an evil hack :o) – BeeBand Jul 13 '10 at 22:15
  • @BeeBand: that's why i warned. When i say something is evil, i don't necessarily mean it should never be used - but one should *very much* be aware why they are using it and alert of unintended consequences. In this case, a very unlikely consequence is that if this is used on Unix file system with `\` use in file or directory name (it's really hard but possible) - this code will 'break' – Nas Banov Jul 14 '10 at 00:33
0

Let assume you have have a file filedata.txt with content:

d:\stuff\morestuff\furtherdown\THEFILE.txt
d:\otherstuff\something\otherfile.txt

You can read and split the file paths:

>>> for i in open("filedata.txt").readlines():
...     print i.strip().split("\\")
... 
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
['d:', 'otherstuff', 'something', 'otherfile.txt']
zoli2k
  • 3,388
  • 4
  • 26
  • 36
0

Below line of code can handle:

  1. C:/path/path
  2. C://path//path
  3. C:\path\path
  4. C:\path\path

path = re.split(r'[///\]', path)

Gour Bera
  • 141
  • 1
  • 4
0

One recursive for the fun.

Not the most elegant answer, but should work everywhere:

import os

def split_path(path):
    head = os.path.dirname(path)
    tail = os.path.basename(path)
    if head == os.path.dirname(head):
        return [tail]
    return split_path(head) + [tail]
DuGNu
  • 81
  • 9
0

Adapted the solution of @Mike Robins avoiding empty path elements at the beginning:

def parts(path):
    p,f = os.path.split(os.path.normpath(path))
    return parts(p) + [f] if f and p else [p] if p else []

os.path.normpath() is actually required only once and could be done in a separate entry function to the recursion.

Frank-Rene Schäfer
  • 3,182
  • 27
  • 51
0
from os import path as os_path

and then

def split_path_iter(string, lst):
    head, tail = os_path.split(string)
    if head == '':
        return [string] + lst
    else:
        return split_path_iter(head, [tail] + lst)

def split_path(string):
    return split_path_iter(string, [])

or, inspired by the above answers (more elegant):

def split_path(string):
    head, tail = os_path.split(string)
    if head == '':
        return [string]
    else:
        return split_path(head) + [tail]
Smiley1000
  • 95
  • 2
  • 10
  • This question already has a lot of answers, can you explain what new information your answer provides, or how it improves on the solutions already here? – joanis Apr 07 '21 at 13:57
0

It is a shame! python os.path doesn't have something like os.path.splitall

anyhow, this is what works for me, credit: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch04s16.html

import os

a = '/media//max/Data/'

def splitall(path):
    # https://www.oreilly.com/library/view/python-cookbook/0596001673/ch04s16.html
    allparts = []
    while 1:
        parts = os.path.split(path)
        if parts[0] == path:  # sentinel for absolute paths
            allparts.insert(0, parts[0])
            break
        elif parts[1] == path: # sentinel for relative paths
            allparts.insert(0, parts[1])
            break
        else:
            path = parts[0]
            allparts.insert(0, parts[1])
    return allparts

x = splitall(a)
print(x)

z = os.path.join(*x)
print(z)

output:

['/', 'media', 'max', 'Data', '']
/media/max/Data/
Mahmoud Elshahat
  • 1,873
  • 10
  • 24
-2

use ntpath.split()

deft_code
  • 57,255
  • 29
  • 141
  • 224
  • when i use os.path.split() I get, (`d:\\stuff`, `morestuff\x0curtherdown\thefile.mux`) – BeeBand Jul 02 '10 at 15:46
  • As BeeBand pointed out, os.path.split() really doesn't do the desired thing. – unwind Jul 02 '10 at 15:47
  • sorry I just realized os.path only works depending on your os. ntpath will parse dos paths. – deft_code Jul 02 '10 at 15:50
  • even with ntpath I still get `d:\\stuff, morestuff\x0curtherdown\thefile.mux` – BeeBand Jul 02 '10 at 15:56
  • 2
    @BeeBand: your having issues with escaping your string. `'\x0c'` is the form feed character. The way to create the form feed character is '\f'. If you really want the literal string '\f' you have two options: `'\\f'` or `r'\f'`. – deft_code Jul 02 '10 at 19:47
  • You should delete this answer because it doesn't answer the question. – Boris Verkhovskiy Aug 24 '21 at 06:05