14

I'm writing a personal wiki-style program in Python that stores text files in a user configurable directory.

The program should be able to take a string (e.g. foo) from a user and create a filename of foo.txt. The user will only be able to create the file inside the wiki directory, and slashes will create a subdir (e.g. foo/bar becomes (path-to-wiki)/foo/bar.txt).

What is the best way to check that the input is as safe as possible? What do I need to watch out for? I know some common pitfalls are:

  • Directory traversal: ../
  • Null bytes: \0

I realize that taking user input for filenames is never 100% safe, but the program will only be run locally and I just want to guard for any common errors/glitches.

Robbie JW
  • 729
  • 1
  • 9
  • 22
Puzzled79
  • 1,037
  • 2
  • 10
  • 23

4 Answers4

11

You can enforce the user to create a file/directory inside wiki by normalizing the path with os.path.normpath and then checking if the path begins with say '(path-to-wiki)'

os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')

To ensure that the user's entered path/filename doesn't contain anything nasty, you can force the user to enter a path or filename to either of Lower/Upper Alpha, Numeric Digits or may be hyphen or underscore.

Then you can always check the normalized filename using a similar regular expression

userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt')
re.findall(r'[^A-Za-z0-9_\-\\]',userpath)

To summarize

if userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt') then

if not os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')  
   or re.search(r'[^A-Za-z0-9_\-\\]',userpath):
  ... Do what ever you want with an invalid path
Abhijit
  • 62,056
  • 18
  • 131
  • 204
7

now there is a full library to validate strings: check it out:

from pathvalidate import sanitize_filepath

fpath = "fi:l*e/p\"a?t>h|.t<xt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))

fpath = "\0_a*b:c<d>e%f/(g)h+i_0.txt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))

output:

fi:l*e/p"a?t>h|.t<xt -> file/path.txt
_a*b:c<d>e%f/(g)h+i_0.txt -> _abcde%f/(g)h+i_0.txt
walid anon
  • 79
  • 1
  • 3
  • You probably want to use `sanitize_filename` to avoid keeping slashes in the string (which stand for subdirectories) – xjcl Feb 27 '22 at 17:07
6

Armin Ronacher has a blog post on this subject (and others).

These ideas are implemented as the safe_join() function in Flask:

def safe_join(directory, filename):
    """Safely join `directory` and `filename`.
    Example usage::
        @app.route('/wiki/<path:filename>')
        def wiki_page(filename):
            filename = safe_join(app.config['WIKI_FOLDER'], filename)
            with open(filename, 'rb') as fd:
                content = fd.read() # Read and process the file content...
    :param directory: the base directory.
    :param filename: the untrusted filename relative to that directory.
    :raises: :class:`~werkzeug.exceptions.NotFound` if the resulting path
             would fall out of `directory`.
    """
    filename = posixpath.normpath(filename)
    for sep in _os_alt_seps:
        if sep in filename:
            raise NotFound()
    if os.path.isabs(filename) or filename.startswith('../'):
        raise NotFound()
    return os.path.join(directory, filename)
kaya3
  • 47,440
  • 4
  • 68
  • 97
Simon Sapin
  • 9,790
  • 3
  • 35
  • 44
0

You could just validate all the characters are printable alphanumeric ascii except for the ' ','.', and '/' characters then just remove all instances of bad combinations...

safe_string = str()
for c in user_supplied_string:
    if c.isalnum() or c in [' ','.','/']:
        safe_string = safe_string + c

while safe_string.count("../"):
    # I use a loop because only replacing once would 
    # leave a hole in that a bad guy could enter ".../"
    # which would be replaced to "../" so the loop 
    # prevents tricks like this!
    safe_string = safe_string.replace("../","./")
# Get rid of leading "./" combinations...
safe_string = safe_string.lstrip("./")

That's what I would do, I don't know how pythonic it is but it should leave you pretty safe. If you wanted to validate and not convert then you could just do a test for equality after that like so:

valid = save_string == user_supplied_string
if not valid:
     raise Exception("Sorry the string %s contains invalid characters" % user_supplied_string )

In the end both approaches would probably work, I find this method feels a bit more explicit and should also screen out any weird/non-appropriate characters like '\t','\r', or '\n' Cheers!

john-charles
  • 1,417
  • 4
  • 17
  • 30