6

I want to use a user-provided string as a filename for exporting, but have to make sure that the string is permissible on my system as a filename. From my side it would be OK to replace any forbidden character with e.g. '_'.

Here I found a list of forbidden characters for filenames.

It should be easy enough to use the str.replace() function, I was just wondering if there is already something out there that does that, potentially even taking into account what OS I am on.

Matthias Arras
  • 565
  • 7
  • 25
  • 4
    This is highly platform-dependent! So `try`/`except` is usually the best way. – wim Jan 09 '20 at 21:03
  • You should be able to use the `os` library's built in `is_path_exists_or_creatable()` function. – CMMCD Jan 09 '20 at 21:17
  • I not only want to know if this filename is allowed, I want to fix it if it is not allowed. See second sentence in question. Actually just realized I said initially replace the whole string with '_', but I meant only to replace the char in the string that's not permissible. – Matthias Arras Jan 09 '20 at 21:19
  • I wouldn't accept the burden of producing a valid filename from an invalid one. Try to create the given file, and if there's a problem, catch the exception and inform the user/caller that the given name was invalid. Let them pick a valid name. – chepner Jan 09 '20 at 21:26
  • @chepner Sounds reasonable, however in this case this is taken from meta-data to image data that the user has provided prior to the processing step. So I don't have access to the user at runtime. – Matthias Arras Jan 09 '20 at 21:28
  • I never said it had to be interactive :) Just verify that the names to use *are* valid, and if they are not, fail early and let the user submit a list that *is* valid. – chepner Jan 09 '20 at 21:29
  • 2
    Suppose the user submits two invalid names `foo/a_b` and `foo_a/b`. Both will map to `foo_a_b`: now what are you supposed to do? It's one thing to overwrite the first with the second if the user (apparently) asked you to; another thing entirely to do that when the user thought they provided two distinct names. – chepner Jan 09 '20 at 21:30
  • I agree it will be necessary to catch if things will be overwritten as a result to sanitization of the filename. You see anything being wrong with appending an integer in cases like that? Since the meta-data entry happens at another system (which I have no control over) and I parse those files, it will be impossible to relay it back to the initial user that provided the string. – Matthias Arras Jan 09 '20 at 21:40
  • 1
    This question seems to be essentially a duplicate of https://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename – mhucka Nov 03 '21 at 15:08

3 Answers3

8

pathvalidate is a Python library to sanitize/validate a string such as filenames/file-paths/etc.

This library provides both utilities for validation of paths:

import sys
from pathvalidate import ValidationError, validate_filename

try:
    validate_filename("fi:l*e/p\"a?t>h|.t<xt")
except ValidationError as e:
    print("{}\n".format(e), file=sys.stderr)

And utilities for sanitizing paths:

from pathvalidate import sanitize_filename

fname = "fi:l*e/p\"a?t>h|.t<xt"
print("{} -> {}".format(fname, sanitize_filename(fname)))
kmaork
  • 5,722
  • 2
  • 23
  • 40
  • In case the string is admissible, great. In case it causes an error, how would I correct the filename accordingly? The error message does point to the solution. `invalid char found: invalids=(':', '*', '/', '"', '?', '>', '|', '<')`. So basically gives me the list of chars I need to replace in the string with a permissible 'fallback'. An idea how to get this without parsing the error message though? – Matthias Arras Jan 09 '20 at 21:18
  • Look at the documentation. They have plenty of examples. – kmaork Jan 09 '20 at 21:21
  • 1
    `from pathvalidate import sanitize_filename` is the answer. As described here: https://pathvalidate.readthedocs.io/en/latest/pages/examples/sanitize.html – Matthias Arras Jan 09 '20 at 21:24
  • You care to put that into your answer to make it whole? Or want me to add it there? – Matthias Arras Jan 09 '20 at 21:25
  • 1
    Be aware you will still need to validate the filename length. – OscarVanL Jun 02 '20 at 21:15
1

Depending on your use case it might be easier to whitelist characters that are allowed in filename instead of attempting to construct a blacklist.

A canonical way would be to check if each character in your filename to be is contained in the list of portable posix filename characters.

https://www.ibm.com/docs/en/zos/2.1.0?topic=locales-posix-portable-file-name-character-set

Uppercase A to Z
Lowercase a to z
Numbers 0 to 9
Period (.)
Underscore (_)
Hyphen (-)

Based on this you can then:

ok = ".-_0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
for character in filename:
    assert character in ok

        
0

A better solution may be for you to store the files locally using generated filenames that are guaranteed to be unique and file system safe (any UUID generator would do, for example). Maintain a simple database that maps between the original filename and the UUID for later use.

jarmod
  • 71,565
  • 16
  • 115
  • 122