1

I am making an app within which users can create "posts" and "comments". Once these posts and comments are submitted they are sent by ajax to a php page and inserted to a database. They are then retrieved from the database and inserted to the page straight away without approval. I would like to have a strict Regular expression so that no harmful text can be submitted, however also allowing some unicode characters such as accented vowels. So my javascript regular expression is as follows:

postRegex = /^([A-Za-z0-9\u00C0-\u017F \/.,-_$!\'&*()="?#+%:;\[\]\r\r\n]{1,1000})$/;

My theory was If I disallow brackets such as < > then this could stop script tags being inserted. However when i try to submit text such as iframe embed code, to my surprise the form DOES submit.

<iframe width="100%" height="450" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/217462626&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false&amp;visual=true"></iframe>

From my understanding of this regex I thought it would not let the <> brackets be submitted.

My regex does seem to be working though, as it does not submit the form when such characters as | are within the text. Is there an error in my regex?

Also can you give me advice if there is a better way to stop malicious content being inserted.

On the server side I am also sanitizing the content (before inserting to the database) as can be seen below in submit_post.php.

HTML form:

<form id="post_form" name="post_form" method="post" action="">                                      
        <label for="post_text"></label>     
        <textarea id="post_text" name="post_text" maxlength="1000" placeholder="Write your post here..."  rows="2" data-role="none" required></textarea>
</form>

Javascript and JQuery:

$("#post_form").on('submit', function(e){
        //this will execute when a post is submitted.
        e.preventDefault(); 
        var text_of_post  = $('#post_text').val();

        var postIsValid = validateInput(text_of_post, postRegex);
        if(!postIsValid){
            console.log('not valid');
            //content of form is not valid
        }else{
            //content of form is valid
            $.ajax({
            //do an ajax request passing along the user json web token for validation on server side and also the text that was submitted.
                url: app_root_url + 'submit_post.php',
                data: {'usertoken': token, 'text_of_post' : text_of_post},
                type: "POST",
                success: function(data){
                var result = JSON.parse(data);
                }
            });
    }

});

function validateInput(inputValue, regularExpression){
        var inputIsValid = regularExpression.test(inputValue);
        return inputIsValid;
}

PHP: submit_post.php

$postText = filter_var($_POST['text_of_post'], FILTER_SANITIZE_STRING);
Sarah
  • 1,943
  • 2
  • 24
  • 39
  • 1
    There is an unescaped `-`, between `,` (ASCII 44) and `_` (ASCII 95), which opens up a range that contains `>` (ASCII 3E) and `<` (ASCII 3C), but not `|` (ASCII 7C). – Sebastian Proske Nov 20 '16 at 11:18
  • @SebastianProske Thanks so much and great job on explaining it. That is working now for the brackets < > +1 – Sarah Nov 20 '16 at 11:26
  • @SebastianProske Also, Do you have any opinion on the security of this regex? For example do any of the characters i'm allowing here pose a possible threat? (either to the database or else after inserting back into the HTML.) – Sarah Nov 20 '16 at 11:32
  • Input filters cannot prevent XSS or SQLi. Without further protections (output encoding for XSS, parameterized queries for SQLi), your code is vulnerable to both. – Gabor Lengyel Nov 20 '16 at 12:25
  • @Sarah Don't reinvent the wheel. As this should be a quite common problem, there should exist well tested libraries for your problem. – Sebastian Proske Nov 20 '16 at 12:39
  • @GaborLengyel Thanks. I've taken great care to use parameterized queries with a PDO connection also. I will look more into output encoding thanks – Sarah Nov 20 '16 at 12:45
  • @SebastianProske thanks.. If you can write your first comment above (about the unescaped - ) as an answer I will accept it. thanks – Sarah Nov 20 '16 at 12:47
  • Client-side validation should not have a "security" tag. This kind of validation is done for the user, and it trivial to bypass. – rook Nov 20 '16 at 16:29
  • @rook thank you very much for your information :) You're very kind to help :) – Sarah Nov 20 '16 at 17:10
  • @rook RULE 3 here: "JavaScript Escape Before Inserting Untrusted Data into JavaScript Data" Values https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet so can you tell me would it be more secure If I used encodeURIComponent() before inserting the untrusted user input into a variable??. i.e. see my javascript code above var text_of_post = encodeURIComponent($('#post_text').val()); – Sarah Nov 21 '16 at 18:21
  • @Sarah using a regex like this means that people can't comment with the full range of characters, which causes havoc for emogi <3. In this case i want an angle bracket <3, but they can also be used for xss ``. The answer is not a Regex, it is HTML encoding (https://en.wikipedia.org/wiki/Character_encodings_in_HTML#HTML_character_references). Which allows you to display control characters without them being parsed. PHP uses htmlspeclaichars() to do this - go ahead try it for yourself. try sending `text_of_post=` – rook Nov 21 '16 at 19:31
  • @rook I see what you mean. I have just tried turning off the regex validation and In my php file I tried $postText = htmlspecialchars($_POST['text_of_post']); and now the script tags do not parse and the <3 emoji still appears which is good.. I also tried using this in conjunction with filter_var() like so: $postText = htmlspecialchars(filter_var($_POST['text_of_post'], FILTER_SANITIZE_STRING)); however the <3 does not appear now. do you think it is secure enough to just use htmlspecialchars() without applying filter_var() to the input? also do I need encodeURIComponent() before sending to php – Sarah Nov 21 '16 at 20:32
  • @Sarah The vast majority of XSS is solved using `htmlspecialchars()` look up the docs - it will work in most cases and it is the best tool for the case of comments. XSS is slippery problem and there are always edge cases - this is not one of them. Remove all sanitation, and make sure you see the alert box - you should be able to test this condition (you might need to use firefox). – rook Nov 21 '16 at 21:40
  • @rook ok thanks. I will use that for this section of my app.The thing is with the app im working on, there is another main section where i'm retrieving a large amount of data from a wordpress database and inserting it to the html. I wouldnt be able to use htmlspecialchars as the wordpress post content contains some html elements and also scripts. I would like to have an in-depth conversation with an expert on this topic as it's probably too much to write here. do you know anywhere to get in-depth advice on this as i've read articles and still find im unsure of what exactly is needed in my case – Sarah Nov 21 '16 at 21:52

0 Answers0