3

so I'm writting a generic backup application with os module and pickle and far I've tried the code below to see if something is a file or directory (based on its string input and not its physical contents).

import os, re

def test(path):
    prog = re.compile("^[-\w,\s]+.[A-Za-z]{3}$")
    result = prog.match(path)
    if os.path.isfile(path) or result:
        print "is file"
    elif os.path.isdir(path):
        print "is directory"
    else: print "I dont know"

Problems

test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
I dont know
test("beach.jpg")
I dont know
test("/directory/")
I dont know

Desired Output

test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
is file
test("beach.jpg")
is file
test("/directory/")
is directory

Resources

what regular expression should I be using to tell the difference between what might be a file and what might be a directory? or is there a different way to go about this?

Community
  • 1
  • 1
classicjonesynz
  • 4,012
  • 5
  • 38
  • 78
  • 1
    There are built-ins for this kind of thing. Also, you generally want to avoid using regular expressions to manipulate path names, too. They are not only locale dependent (e.g. the yen character in Japan as a path delimiter), but os dependent as well. – kreativitea Oct 17 '12 at 00:17
  • 1
    I think you're conflating two separate issues. Regular expressions won't tell you whether a file is a directory or a regular file. You need to use `os.path` instead. See this [solution](http://stackoverflow.com/questions/82831/how-do-i-check-if-a-file-exists-using-python) – David Oct 17 '12 at 00:18
  • 1
    Also, what if I have a directory called `/beach.jpg`? – Joel Cornett Oct 17 '12 at 00:34
  • 1
    @Joel then you should be absolutely ashamed you have such a ridiculously far out naming convention that defies all laws of logic and regex classification systems ;) – Jon Clements Oct 17 '12 at 00:50
  • @JonClements: I don't, but I was trying to point out the issues with using regular expressions to identify filesystem objects ;) – Joel Cornett Oct 17 '12 at 00:57

3 Answers3

5

The os module provides methods to check whether or not a path is a file or a directory. It is advisable to use this module over regular expressions.

>>> import os
>>> print os.path.isfile(r'/Users')
False
>>> print os.path.isdir(r'/Users')
True
Mr. Squig
  • 2,755
  • 17
  • 10
  • Oh thats even better :) than what I was trying to accomplish lol :) – classicjonesynz Oct 17 '12 at 00:15
  • It works but `print os.path.isfile(r'/Users')` doesn't work with strings only actual files that can be openned (`open()`). – classicjonesynz Oct 17 '12 at 00:28
  • I'm not sure I understand. If a file can be 'openned' it exists, otherwise it will throw an exception. You could use a try/except block to test that way. – Mr. Squig Oct 17 '12 at 00:31
  • My problem is because my application will be reading strings from a command line and the user may want to restore a file, but that file may not exist on the harddrive (only has a hash). The os.path.isfile()` will return `False` (because the file does not yet exist). But +1 rep great answer :) learn something new every day – classicjonesynz Oct 17 '12 at 00:42
4

This might help someone, I had the exact same need and I used the following regular expression to test whether an input string is a directory, file or neither: for generic file:

^(\/+\w{0,}){0,}\.\w{1,}$

for generic directory:

^(\/+\w{0,}){0,}$

So the generated python function looks like :

import os, re

def check_input(path):
    check_file = re.compile("^(\/+\w{0,}){0,}\.\w{1,}$")
    check_directory = re.compile("^(\/+\w{0,}){0,}$")
    if check_file.match(path):
        print("It is a file.")
    elif check_directory.match(path):
        print("It is a directory")
    else:
        print("It is neither")

Example:

  • check_input("/foo/bar/file.xyz") prints -> Is a file
  • check_input("/foo/bar/directory") prints -> Is a directory
  • check_input("Random gibberish") prints -> It is neither

This layer of security of input may be reinforced later by the os.path.isfile() and os.path.isdir() built-in functions as Mr.Squig kindly showed but I'd bet this preliminary test may save you a few microseconds and boost your script performance.

PS: While using this piece of code, I noticed I missed a huge use case when the path actually contains special chars like the dash "-" which is widely used. To solve this I changed the \w{0,} which specifies the requirement of alphabetic only words with .{0,} which is just a random character. This is more of a workaround than a solution. But that's all I have for now.

Yondaime008
  • 185
  • 1
  • 2
  • 13
3

In a character class, if present and meant as a hyphen, the - needs to either be the first/last character, or escaped \- so change "^[\w-,\s]+\.[A-Za-z]{3}$" to "^[-\w,\s]+\.[A-Za-z]{3}$" for instance.

Otherwise, I think using regex's to determine if something looks like a filename/directory is pointless...

  • /dev/fd0 isn't a file or directory for instance
  • ~/comm.pipe could look like a file but is a named pipe
  • ~/images/test is a symbolic link to a file called '~/images/holiday/photo1.jpg'

Have a look at the os.path module which have functions that ask the OS what something is...:  

Drdilyor
  • 1,250
  • 1
  • 12
  • 30
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • Thanks +1 rep because you've shown me how to use regex in python :) – classicjonesynz Oct 17 '12 at 00:15
  • I've slightly modified my original question – classicjonesynz Oct 17 '12 at 00:30
  • @Killrawr `is_dir = name[-1] in r'\/'` ? So anything that ends in a path separator (either kind), is just treated as a dir, otherwise, you just don't know – Jon Clements Oct 17 '12 at 00:34
  • Thanks Jon the output 3 out of 4 it knows what it is but when it gets to something like `/beach.jpg` the output comes back as `I dont know`, wouldn't the RegEx pick up that its a file because it has `.jpg` ?? – classicjonesynz Oct 17 '12 at 00:40
  • 1
    @Killrawr well, you could assume that anything that looks like it has an extension is a file, but then on Linux systems, a lot of stuff doesn't have extensions, but are more likely files, while other stuff that look like they could have extensions, could well be special devices/pipes/other... - so you're going to be wrong one way or another, but if you're happy with that - you get to pick which way you want to do it, and on your own back be it! – Jon Clements Oct 17 '12 at 00:45