0

Let's say I have a string that is a root directory that has been entered

'C:/Users/Me/'

Then I use os.listdir() and join with it to create a list of subdirectories.

I end up with a list of strings that are like below:

'C:/Users/Me/Adir\Asubdir\'

and so on.

I want to split the subdirectories and capture each directory name as its own element. Below is one attempt. I am seemingly having issues with the \ and / characters. I assume \ is escaping, so '[\\/]' to me that says look for \ or / so then '[\\/]([\w\s]+)[\\/]' as a match pattern should look for any word between two slashes... but the output is only ['/Users/'] and nothing else is matched. So I then I add a escape for the forward slash.

'[\\\/]([\w\s]+)[\\\/]'

However, my output then only becomes ['Users','ADir'] so that is confusing the crud out of me.

My question is namely how do I tokenize each directory from a string using both \ and / but maybe also why is my RE not working as I expect?

Minimal Example:

import re, os

info = re.compile('[\\\/]([\w ]+)[\\\/]')


root = 'C:/Users/i12500198/Documents/Projects/'

def getFiles(wdir=os.getcwd()):
    files = (os.path.join(wdir,file) for file in os.listdir(wdir)
                 if os.path.isfile(os.path.join(wdir,file)))
    return list(files)

def getDirs(wdir=os.getcwd()):
    dirs = (os.path.join(wdir,adir) for adir in os.listdir(wdir)
                if os.path.isdir(os.path.join(wdir,adir)))
    return list(dirs)

def walkSubdirs(root,below=[]):
    subdirs = getDirs(root)
    for aDir in subdirs:
        below.append(aDir)
        walkSubdirs(aDir,below)       
        
    return below   

subdirs = walkSubdirs(root)
    
for aDir in subdirs:
    files = getFiles(aDir)
    for f in files:
        finfo = info.findall(f)
        print(f)
        print(finfo)
ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
Chemistpp
  • 2,006
  • 2
  • 28
  • 48
  • 1
    How did you get `'C:/Users/Me/Adir\Asubdir\'` out of Python? That's not a valid string. – wjandrea Jan 31 '22 at 18:17
  • Added an update to clarify. I am using walking and using os.join to join strings returned by os.listdir – Chemistpp Jan 31 '22 at 18:22
  • 1
    Oh, so it's not a string, it's the printed output. Please clarify that. – wjandrea Jan 31 '22 at 18:26
  • Sorry boss. Would have if I knew that. I assumed it was a string. I learned a couple things today with this question... – Chemistpp Jan 31 '22 at 18:29
  • 2
    Ah OK, you should learn some more about data vs its representation. Check out [Why do backslashes appear twice?](/q/24085680/4518341) – wjandrea Jan 31 '22 at 18:31

1 Answers1

3

I want to split the subdirectories and capture each directory name as its own element

Instead of regular expressions, I suggest you use one of Python's standard functions for parsing filesystem paths.

Here is one using pathlib:

from pathlib import Path

p = Path("C:/Users/Me/ADir\ASub Dir\2 x 2 Dir\\")
p.parts
#=> ('C:\\', 'Users', 'Me', 'ADir', 'ASub Dir\x02 x 2 Dir')

Note that the behaviour of pathlib.Path depends on the system running Python. Since I'm on a Linux machine, I actually used pathlib.PureWindowsPath here. I believe the output should be accurate for those of you on Windows.

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257