I am using UTF-8 regex to get the parts of the Content-Type:
header line, since I am in the habit to configure my servers to consistently use UTF-8.
// example type, actually this will be negotiated from request `Accept:` header line.
$content_type = 'TeXt/HtMl';
preg_match('~^([\w-]+\*?)/([\w-]+\*?)$~ui', $content_type, $matches);
I consider to load classes from a filesystem path built based on the subpattern matches.
Is there any thinkable way to inject some '/../'
by encoding attacks?
How does internal encoding work in general? Do I have to care what charset the request is encoded when processing data in PHP code or does the convertion work automatically and reliably? What else is to keep in mind with encoding security? How can one ensure encoding in deployed code running on unknown systems?
EDIT: As asked in comments, some further code could look like e.g.:
m1 = strtolower($matches[1]);
m2 = strtolower($matches[2]);
include_once "/path/to/project/content_handlers/{$m1}_{$m2}";
Remarks: My question was meant to be more general. Let's think about some scenario: The PHP script is encoded in UTF-8. The server's filesystem is encoded in character set A. The client manipulates the request to be sent in encoding B. Is there a potential risk that the accepted header is written in a way the preg_* functions do not recognize some '/../'
(parent directory) but the filesystem? The question is not limited to the particular regex in the example. Could an attacker be able to include arbitrary files present in the filesystem when not taking further precautions?
Remarks 2: In the provided example I cannot rely on http_negotiate_content_type
since it is not sure if pecl_http is installed on the target server. There is a scripted polyfill as well. Again: This is not a question for a particular case. I want to learn how to treat (even manipulated) client encodings in general.
Remarks 3: Some similar problem (with SQL encoding attacks) is disussed here: Are PDO prepared statements sufficient to prevent SQL injection? However, my question is about filesystem encoding. Could happen something similar?