If you're only ever using these inputs as file names inside your program, and you're storing them on a native Linux filesystem, then the critical things to watch for are:
- absolutely proscribe any file name starting with
../
or containing /../
or ending with /..
. Such file names could allow the user to reach files outside the directory tree that you're working in.
- Be wary of any file name containing
/
as these allow the user to name subdirectories, possibly with unintended consequences.
Other things that could cause trouble include:
- Non-ASCII characters that may have a different meaning if used in a different locale.
- Some ASCII punctuation characters may have a special meaning in parts of your processing system or may be invalid in some filesystems.
- Some parts of your system may be case-sensitive with other parts being case-insensitive. Consider normalizing the case.
If applicable, restrict each field to something that isn't going to cause any trouble. For example:
- A machine ID should probably consist of only ASCII lower letters and digits (or only ASCII uppercase letters and digits).
- A hostname should consist of only ASCII lowercase letters and digits, plus
-
but not in an initial position (use Punycode for non-ASCII host names). If these are fully qualified host names, as opposed to host names in a network, then .
is also valid, but not in initial position.
- No field should be empty or contain a
/
or start with a .
(an initial .
could be .
or ..
— see above — and would be a dot file that ls
doesn't show by default and isn't included in the pattern *
in shells, so they're best avoided).
While control characters such as backspace aren't directly harmful, they can be indirectly harmful in that if you're investigating an issue on the command line, they can cause you to make mistakes. Do not allow them.