3

I have a simple app programmed in PHP using CodeIgniter 4 framework and, as a web application, it has some HTML forms for user input.

I am doing two things:

  1. In my Views, all variables from the database that come from user input are sanitized using CodeIgniter 4's esc() function.

  2. In my Controllers, when reading HTTP POST data, I am using PHP filters:

    $data = trim($this->request->getPost('field', FILTER_SANITIZE_SPECIAL_CHARS));

I am not sure if sanitizing both when reading data from POST and when printing/displaying to HTML is a good practice or if it should only be sanitized once.

In addition, FILTER_SANITIZE_SPECIAL_CHARS is not working as I need. I want my HTML form text input to prevent users from attacking with HTML but I want to keep some 'line breaks' my database has from the previous application.

FILTER_SANITIZE_SPECIAL_CHARS will NOT delete HTML tags, it will just store them in the database, not as HTML, but it is also changing my 'line breaks'. Is there a filter that doesn't remove HTML tags (only stores them with proper condification) but that respects \n 'line breaks'?

steven7mwesigwa
  • 5,701
  • 3
  • 20
  • 34
user1314836
  • 219
  • 1
  • 4
  • 14
  • `esc();` is enough on inserting or displaying data if you are not allowing html tags. – DLK Jan 29 '23 at 20:17
  • Thank you @DLK. In that case, what are filters such as FILTER_SANITIZE_SPECIAL_CHARS for? – user1314836 Jan 30 '23 at 18:16
  • Also, I have doubts about whether I should `esc()` all my variables. In theory, only strings from previous user input should be sanitised, but it is sometimes difficult to remember whether a particular string is a number or a string, and wether the value from the database was previously stored or inserted by the user. Is using `esc()` for ALL variables in views a good practice or does it just mess up the code for nothing? – user1314836 Jan 30 '23 at 18:20
  • esc will trim all html tags from your content, you can use it if you dont want to allow any html tags. it has enough functions to keep you safe. – DLK Jan 31 '23 at 18:32
  • Thank you @DLK but your comment didn't answer to my additional interest in the topic. What is FILTER_SANITIZE_SPECIAL_CHARS for, and is it a good practice to trim all printed variables regardless of whether they are strings or some other type of variable, or user-input or not? – user1314836 Feb 01 '23 at 16:31
  • 2
    Like you've been told in the comments already, you shouldn't use that flag at all. So the problem with line feeds solved. – Your Common Sense Feb 01 '23 at 19:38
  • @YourCommonSense That's true, not using that flag resolves he problem with line breaks. But I don't really understand why those filters exist for or when they should be applied. – user1314836 Feb 02 '23 at 20:03
  • That's a completely different question, and asking more than one question make your post off topic. – Your Common Sense Feb 03 '23 at 05:58
  • 1
    I think this answers it, but I cant mark it as duplicate because of the bounty. [How does Codeigniter handle escaping output?](https://stackoverflow.com/questions/8722099/how-does-codeigniter-handle-escaping-output) – Rohit Gupta Feb 04 '23 at 00:21
  • [How can I sanitize user input with PHP?](https://stackoverflow.com/questions/129677/how-can-i-sanitize-user-input-with-php?rq=1) – steven7mwesigwa Feb 04 '23 at 00:56
  • @RohitGupta that question is for CodeIgniter 2 I believe and this is CodeIgniter 4. – user1314836 Feb 05 '23 at 01:29
  • 1
    @steven7mwesigwa that's much interesting, thanks! – user1314836 Feb 05 '23 at 01:29
  • I definitely think that I don't need to sanitize user-input data because I am not writing sql directly but rather using CodeIgniter 4's functions to create sql safe queries. On the other hand, I do definitely need to `esc()` that same information when showing to avoid showing html where just text is expected. – user1314836 Feb 05 '23 at 01:31
  • not an answer but, both sides have to be "secured" with different purposes. User input have to be "secured" in order to prevent sql injection or such hacks (to make sure you are storing input as given by user) and on the other end content to render has to be secured as well so that it can be safely rendered in a browser (no malicious scripts, ...) – Tuckbros Feb 08 '23 at 15:13

5 Answers5

3

You don't need to sanitize User input data as explained in the question below:

How can I sanitize user input with PHP?

It's a common misconception that user input can be filtered. PHP even has a (now deprecated) "feature", called magic-quotes, that builds on this idea. It's nonsense. Forget about filtering (or cleaning, or whatever people call it).

In addition, you don't need to use FILTER_SANITIZE_SPECIAL_CHARS, htmlspecialchars(...), htmlentities(...), or esc(...) either for most use cases:

-Comment from OP (user1314836)

I definitely think that I don't need to sanitize user-input data because I am not writing SQL directly but rather using CodeIgniter 4's functions to create SQL safe queries. On the other hand, I do definitely need to esc() that same information when showing to avoid showing html where just text is expected.

The reason why you don't need the esc() method for most use cases is:

Most User form input in an application doesn't expect a User to submit/post HTML, CSS, or JavaScript that you plan on displaying/running later on.

If the expected User input is just plain text (username, age, birth date, etc), images, or files, use form validation instead to disallow unexpected data.

I.e: Available Rules and Creating Custom Rules

By using the Query Builder for your database queries and rejecting unexpected User input data using validation rules (alpha, alpha_numeric_punct, numeric, exact_length, min_length[8], valid_date, regex_match[/regex/], uploaded, etc), you can avoid most potential security holes i.e: SQL injections and XSS attacks.

steven7mwesigwa
  • 5,701
  • 3
  • 20
  • 34
1

Answer from steven7mwesigwa gets my vote, but here is how you should be thinking about it.

Rules Summary

  • You should always hold in memory the actual data that you want to process.
  • You should always convert the data on output into a format that the output can process.

Inputs:

You should strip from all untrusted inputs (user forms, databases that you didn't write to, XML feeds that you don't control etc)

  • any data that you are unable to process (e.g. if you are not able to handle multi-byte strings as you are not using the right functions, or your DB won't support it, or you can't handle UTF8/16 etc, strip those extra characters you can't handle).
  • any data that will never form part of the process or output (e.g. if you can only have an integer/bool than convert to int/bool; if you are only showing data on an HTML page, then you may as well trim spaces; if you want a date, strip anything that can't be formatted as a date [or reject*]).

This means that many "traditional" cleaning functions are not needed (e.g. Magic Quotes, strip_tags and so on): but you need to know you can handle the code. You should only strip_tags or escape or so on if you know it is pointless having that data in that field.

Note: For user input I prefer to hold the data as the user entered and reject the form allowing them to try again. e.g. If I'm expected a number and I get "hello" then I'll reload the form with "hello" and tell the user to try again. steven7mwesigwa has links to the validation functions in CI that make that happen.

Outputs:

Choose the correct conversion for the output: and don't get them muddled up.

  • htmlspecialchars (or family) for outputting to HTML or XML; although this is usually handled by any templating engine you use.
  • Escaping for DB input; although this should be left to the DB engine you use (e.g. parameterised queries, query builder etc).
  • urlencode for outputting a URL
  • as required for saving images, json, API responses etc

Why?

If you do out output conversion on input, then you can easily double-convert an input, or lose track of if you need to make it safe before output, or lose data the user wanted to enter. Mistakes happen but following clean rules will prevent it.

This also mean there is no need to reject special characters (those forms that reject quote marks are horrible user experience, for example, and anyone putting restrictions on what characters can go in a password field are only weakening security)


In your particular case:

  • Drop the FILTER_SANITIZE_SPECIAL_CHARS on input, hold the data as the user gave it to you
  • Output using template engine as you have it: this will display < > tags as the user entered then, but won't break your output.

You will essentially sanitize each and every output (that you appear to want to avoid), but that's safer than accidentally missing a sanitize on output and a better user experience than losing stuff they typed.

Robbie
  • 17,605
  • 4
  • 35
  • 72
0

From my understanding,

FILTER_SANITIZE_SPECIAL_CHARS is used to sanitize the user input before you act on it or store it.

Whereas esc is used to escape HTML etc in the string so they don't interfere with normal html, css etc. It is used for viewing the data.

So, you need both, one for input and the other for output.


Following from codeigniter.com. Note, it uses the Laminas Escaper library.

esc($data[, $context = 'html'[, $encoding]])

Parameters

$data (string|array) – The information to be escaped.

$context (string) – The escaping context. Default is ‘html’.

$encoding (string) – The character encoding of the string.

Returns The escaped data.

Return type mixed

Escapes data for inclusion in web pages, to help prevent XSS attacks. This uses the Laminas Escaper library to handle the actual filtering of the data.

If $data is a string, then it simply escapes and returns it. If $data is an array, then it loops over it, escaping each ‘value’ of the key/value pairs.

Valid context values: html, js, css, url, attr, raw


From docs.laminas.dev

What laminas-Escaper is not

laminas-escaper is meant to be used only for escaping data for output, and as such should not be misused for filtering input data. For such tasks, use laminas-filter, HTMLPurifier or PHP's Filter functionality should be used.


Some of the functions they do are similar. Such as both may/will convert < to &lt. However, your stored data may not have come just from user input and it may have < in it. It is perfectly safe to store it this way but it needs to be escaped for output otherwise the browser could get confused, thinking its html.

Rohit Gupta
  • 4,022
  • 20
  • 31
  • 41
0

Note that if you are using the Codeigniter 4 Form helper: "If you use any of the form helper functions listed on this page, and you pass values as an associative array, the form values will be automatically escaped, so there is no need to call this function. Use it only if you are creating your own form elements, which you would pass as strings."

-1

I think for this situation using esc is sufficient. FILTER_SANITIZE_SPECIAL_CHARS is a PHP sanitize filter that encode '"<>& and optionally strip or encode other special characters according to the flag. To do that you need to set the flag. It is third parameter in getPost() method. Here is an example

$this->request->getPost('field', FILTER_SANITIZE_SPECIAL_CHARS, FILTER_FLAG_ENCODE_HIGH)

This flag can be change according to your requirements. You can use any PHP filter with a flag. Please refer php documentation for more info.

machan
  • 93
  • 1
  • 9
  • Your comment is self-contradicting. First it says using esc is sufficient but then it suggests to use FILTER_SANITIZE_SPECIAL_CHARS anyway. – Your Common Sense Feb 01 '23 at 19:34
  • @YourCommonSense He is asking about What is FILTER_SANITIZE_SPECIAL_CHARS is for as well. Check his comment thread. So that's why I explained it. – machan Feb 01 '23 at 19:41
  • 1
    This is a Q&A site where only one question per post is allowed. So only one answer as well – Your Common Sense Feb 01 '23 at 19:42
  • @machan Could you better explain why you believe using `esc` is sufficient, and to fully understand it, present when the filters are useful? – user1314836 Feb 02 '23 at 19:46
  • @user1314836 when you used esc() it will escape all html and help prevent XSS attacks. It uses laminas escaper. With filters and flags you can specify what characters should be escaped. It will be useful when you need to escape only specific characters. – machan Feb 03 '23 at 13:38