7

Uber simple example to illustrate the point:

$message = $_POST['message'];

$fp = fopen("log.txt", "a");
fwrite($fp, $message);

fclose($fp);

Should I be sanitizing user input for the $_POST['message'] variable?

I understand prepared statements (for database sanitization) and htmlentities (if I were outputting the POST message back to the screen at some time) but in this case, the input is simply sitting in a log file that will be read by a small PHP script (via fopen())

Is the answer dependent on how it will be read? For example if I do open the log file via fopen() it should be htmlentities, and if I plan to download the log file and read it with Excel (for filtering purposes), there is nothing to be done?

Chris
  • 54,599
  • 30
  • 149
  • 186
  • This doesn't come up very much, but I suppose someone could post the binary contents of a virus. – Explosion Pills Jan 28 '13 at 04:27
  • 1
    I think for logical reasons it is better to sanitize it. You don't want someone to be allowed to even just "store" malicious content on your server. If it is there, it may be wrongly used one way or another – Hanky Panky Jan 28 '13 at 04:45
  • @Explosion Pills, hmm wow that would be an edge case. Interested to know how this could be countered? – Chris Jan 28 '13 at 05:16
  • @Hanky Panky - What would I sanitize against? Would it simply be `htmlentities`? – Chris Jan 28 '13 at 05:17
  • 3
    If the user puts newlines or some other separator character in the input, it could cause issues when you try to parse the log later. And, if the user decides to upload a lot of data, he could fill your disk space pretty quickly. – nneonneo Jan 28 '13 at 05:20
  • @nneonneo That's an interesting point, especially about the newline issue. The diskspace one I'm not sure how to protect against, except possibly limiting the message size (this is purely for logging purposes) - but I'm assuming the worse, and I'm assuming people will intentionally hijack the error messages – Chris Jan 28 '13 at 05:27
  • 1
    Regardless of files, tampering, hooking, malware, your neighbor having sex in the other room, etc. **you should always sanitize all user input.** –  Jan 28 '13 at 05:33
  • 1
    What do you do with the data when reading it later? – Gumbo Jan 28 '13 at 05:44
  • @H2CO3 I'm going to play devils advocate here - but do you? And I guess this is the background of my question, unlike, say database entry where the context of the input is obvious, and hence the sanitization methods. But when the context is less obvious, what sanitization should take place? According to the primary answer [here](http://stackoverflow.com/questions/129677/whats-the-best-method-for-sanitizing-user-input-with-php), it's more about where it's coming out, than where it's going in. I'm looking for healthy discussion here (in the comments) so please take it as such :) – Chris Jan 28 '13 at 06:02
  • @Gumbo I will be opening with a separate log viewer (a simple PHP file using `fopen`). So at this stage I'm leaning towards running each line through `htmlentities`. I might also occasionally download the entire days log and run it through a spreadsheet program for filtering and the like – Chris Jan 28 '13 at 06:06
  • @Chris I don't do serious web development often, but when I do, I usually make my best to make sure in no condition could any malicious input get through/do nasty things. –  Jan 28 '13 at 06:57
  • I think perhaps a better question is why you are writing to the filesystem from your website application. The filesystem is complicated and prone to errors (what if they inject a NUL-byte, what if they inject a payload that causes your AV to spark off? What if they include a new-line? What if PHP decides that PHP scripts in the .txt file should be executed when an attacker visits it directly? The safest place for user-input is ALWAYS in the database. Never leave it lying around on disk. – SecurityMatt Jan 28 '13 at 07:37
  • 1
    I think you should control the length of `$_POST` data too. – MahanGM Jan 28 '13 at 08:10
  • 1
    @SecurityMatt That's a fair call. My initial goal was for it to act as a simple logging system, writing small logs of the site's activity. I'm using KLogger for this task. The goal was to keep the logging as simple (and secure) as possible (ie remove the database requirement). – Chris Jan 28 '13 at 08:12
  • @Chris: Unless you're intending on building an entirely static website, you're going to need a database sooner or later, and once you have one, the overhead of having a logs table which contain one row per log line is pretty low. If it's in the database it'll also be easier to query (e.g. find me all logs by IP address like 1.2.3.*, or all logs between 1st Jan 2013 and 14th Jan 2013). Seriously. Use a database. It solves the security problem and it's more useful. – SecurityMatt Jan 28 '13 at 08:15
  • 1
    @SecurityMatt Yes I understand the advantages, and the site _does_ have a database. I'm not saying your wrong, I'm still very open to storing the logs in a db - I get it, from a query point of view, that it makes a lot of sense. On the other hand, Ruby for example, still uses flat files for logging (correct me if I'm wrong - I'm a PHP man). Have a flick through [this](http://stackoverflow.com/questions/1499239/database-vs-flat-text-file-what-are-some-technical-reasons-for-choosing-one-ove). Again my question isn't flat file vs database, but given a flat file, how do I protect the system. – Chris Jan 28 '13 at 08:35
  • @Chris: Given a flat file, the best way to protect the system is to make sure the file is outside of the webroot and to never open it. – SecurityMatt Jan 28 '13 at 09:46
  • I don't agree; having such data in the DB is not inherently more secure and escaping done for adding data to the DB wouldn't make it more secure in this context. For example, the user could fill up your DB really quickly the same way. Perhaps you read out the DB contents and execute them. Maybe they put a newline in the DB entry! These are all identical problems. `htmlentities` will offer you ***no*** protection in this context. For the virus problem, all I can think of to do is have good anti-virus software – Explosion Pills Jan 28 '13 at 13:47
  • @H2CO3: Yes -- but it need not always be sanitized before it goes into your storage media. – Billy ONeal Jan 28 '13 at 18:08
  • Also if you happen to accidentally introduce local file inclusion vulnerability somewhere else in your application, then potential attackers can easily upload an execute whatever they want. – Krešimir Nesek Aug 28 '13 at 16:30

6 Answers6

7

Your code is basically innocent. The only "obvious" attack would be to repeatedly upload data to your server, eventually exhausting your disk space.

"sanitizing" is something that's situational. It's not something you can just sprinkle on code to make it better, like you can with salt on food. Perhaps you'll sanitize the $_POST data to prevent SQL injection attacks, but then use the data in an HTML context - now you're vulnerable to XSS attacks. Perhaps it's an image upload, and you do basic MIME-type determination to make sure it IS an image. That's all fine and dandy, but then someone uploads kiddy porn, which will pass the "is it an image" test, and now you've got a much bigger problem.

Since you're accepting user data and writing it out to a file, there is nothing that can be done with this code (except the disk space problem) to abuse your system. You cannot embed some data sequence into the data that'd cause PHP, or the underlying OS, to suddenly stop writing that data out to disk and start executing it. It doesn't matter WHAT kind if data is being uploaded, because it's never being used in a context where it could be used to affect the script's execution. You're simply sucking in some data from the webserver, and spitting it out to disk. You're not allowing the user to influence which file is written to (unless your users have shell-level access to the server and could, say, create a symlink called 'log.txt' pointing at some OTHER more critical file).

The real problem comes AFTERWARD... what do you do with this file after it's been written? If your later code does something silly like

include('log.txt');

then now you DO have a problem - you've now taken this "innocent" data sitting in a file on the disk and turned it into potentially executable code. All it takes is a simple <?php exec('rm -rf /') ?> anywhere in that file to trash your server.

As well, consider something like the inherently idiotic "security" measure that was PHP's magic_quotes. The PHP developers (WRONGLY and STUPIDLY) assumed that ANY data submitted from the outside world would only EVER be used in an SQL context, and did SQL escaping on ALL data, regardless of its ultimate purpose. And to make it worse, they simply assumed that all databases use backslashes for their escape sequence. That's all fine and dandy if you never use anything but MySQL, but what if you're on, say, SQL Server? Now you have to translate the PHP-provided Miles O\'Brien to Miles O''Brien, essentially having to UNDO what PHP did for you automatically.

TL;DR: Don't use shotgun 'sanitization' methods, they're almost always useless/pointless and just involve more work before AND after. Just use context-specific methods at the time you're using the data.

Marc B
  • 356,200
  • 43
  • 426
  • 500
3

You should sanitize user input, but how is entirely dependent on what the input is for. "Sanitizing" refers to the idea of making sure input is safe or sane for a particular use. The term cannot be more specific until you settle on use cases.

You don't need to worry about the PHP reading/writing functions like fopen(). Be concerned with steps that actually parse or analyze the input. Some possible examples:

  • If a file will be displayed in a basic log reader, you might need to make sure that each input is limited to a certain length and doesn't contain line breaks or your chosen field delimiter, and the beginning of each line is a valid time stamp.
  • If a file will be displayed in a web browser, you might need to make sure inputs do not include scripts or links to other resources (like an IMG tag).
  • Excel files would have similar concerns regarding line length, time stamps, and delimiters. You don't have to worry about someone including executable code as long as Excel will be parsing the file as text. (Also, modern Excel versions give you warnings about included macros before running them.)
giskard22
  • 743
  • 1
  • 6
  • 15
1

The general rule is to validate input and sanitize output.

If it is possible to validate your input in any way, then you should. If not, then you should sanitize it when output to make sure it is safe for the context it is used.

e.g. if you know that each message should be less than 100 characters regardless of how it is used, the script that reads the POST data could validate and reject any request whose POST data contains input that is 100 characters or over.

Validation is an "all or nothing" approach that rejects anything that doesn't follow certain rules regardless of output context, whereas sanitisation is the process of "making something safe" depending on the context. I think it's important to make that distinction.

In your case the sample code you provided does not output (except for the puposes of processing by another script). It is more of a storage operation than an output operation in that the message could be written to a database just as easily as the file system. The main attack surface that would need locking down in this case appears to be file permissions and making sure that nothing can read or write to the file other than the scripts you intend to do this and under the correct context. For example, I realise your example was simplified, but in that specific case you should make sure that the file is written to a location above your web root, or to a location that has folder permissions set appropriately. Otherwise, you may have inadvertantly given access for anyone on the web to read http://www.example.com/log.txt and if they can write to it too it may be possible to leverage some sort of XSS attack if they can trick a browser into reading the file as HTML. Old versions of Internet Explorer try and detect the MIME type rather than rely on the server header value of text/plain (see here also). These vulnerabilities may be slightly off topic though, and I just mention them to be thorough and as an example of making sure the files themselves are locked down appropriately.

Back to your question: In your case your validation should take place by the script that processes log.txt. This should validate the file. Note that it is validating the file here, not the raw message. The file should be validated using its own rules to make sure the data is as expected. If the script directly outputs anything, this is where the sanitisation should take place to match the context of the output. So to summarise the process of validation and sanitisation for your application would be:

  1. Create log: Web browser ---POST---> get_message.php ---> validate that message is valid ---fwrite()--> log.txt

  2. Process log: log.txt ---fopen()---> process.php ---> validate that file is valid ---> anything output? then sanitise at this stage.

The above assumes that the correct authorisation is made before processing takes place by the scripts (i.e. that the current user has permissions in your application to logmessages or process logs.)

SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
  • +1 for a great and detailed answer, and also for `validate input, sanitize output`! Really strange that no one has voted this up until now! oO – Levite Aug 29 '14 at 08:37
0

I would sanitize it. When it comes to logs, just make sure you put it into reserved space - for instance, if the log is one record per line, strip the new lines and other stuff from user's input so he cannot fool you.

Take a look at Attack Named Log Injection

Also be very careful when it comes to displaying the log file. Make sure no output can harm your reader.

Zaffy
  • 16,801
  • 8
  • 50
  • 77
0
  • You append to a file in the current directory - this seems to be downloadable via browser, so you're creating a security hole. Place the file outside of the document root (best), or protect it via .htaccess.
  • You should sanitize all user input. Always. What this means depends on how you use this data. You seem to write to a text logfile, so you would want to let only printable and whitespace-class chars through. Sanitize defensively: do NOT specify bad charcodes and let everything else through, but define a list/classes of "good" chars and just let these good chars through.
  • Depending on your use case, you may want to flock() the log file, to prevent multiple parallel requests from mixing up in your file:

    $logtext = sanitizeLog($_POST[Message']); $fd = fopen( "/path/to/log.txt", "a"); if(flock($fd, LOCK_EX)) { fseek($fd, 0, SEEK_END); fwrite($fd, $logtext); flock($fd, LOCK_UN); } fclose($fd);

I've omitted checks for fopen() results...

Thomas L.
  • 104
  • 8
0

Regarding PHP's fwrite() function, there's no need to sanitize: fwrite() just writes that to a file that it gets passed along.

Regarding the log-file, you might wish to sanitize. Here is why:

Suppose an attacker post a multiple line value as message. If your log was before the post

line 1
line 2

then it is after the post

line 1 
line 2
line 3
remainder of line 3
very remainder of line 3

because attacker posted this:

line 3\nremainder of line 3\nvery remainder of line 3

Note: One time posted vs. 3 lines added.

That said: How posted data needs to be sanitized, fully depends on your application.

SteAp
  • 11,853
  • 10
  • 53
  • 88