1

I have looked everywhere, but cannot find a simple library or tool for this.

I would like to sanitise comments on my website.

Currently, I can inject HTML, CSS and pretty much whatever I want through comments.

<div id="commentsSection">
    <div class="submitCommentForm">
        <textarea id="commentsInput" required minlength="10" maxlength="150">
        </textarea>
        <div id="submitComment">SUBMIT</div>
    </div>
    <div id="commentsBox"></div>
</div>

What is the best available method ?

Penny Liu
  • 15,447
  • 5
  • 79
  • 98
TheProgrammer
  • 1,409
  • 4
  • 24
  • 53
  • Use regular expressions to remove tags and not allowed characters –  Jun 11 '17 at 19:57
  • You mean sanitise comments, like avoiding dirty words? – Code4R7 Jun 11 '17 at 19:58
  • https://stackoverflow.com/a/6234804/6024903 – AHB Jun 11 '17 at 19:59
  • 1
    Just an advice. Sanitation should be done at the front end and back end. Front end sanitation it will improve user experience. Back end sanitation for security. – Leandro Papasidero Jun 11 '17 at 20:01
  • @AHB Great thx! But I also want to sanitize js code that could be written in the input – TheProgrammer Jun 11 '17 at 20:05
  • @LeaTano agreed. I have seen some packages for the backend which I should have normally no trouble implementing (I use Node), but I am a bit lost as for the front-end – TheProgrammer Jun 11 '17 at 20:06
  • @TheProgrammer Library is always easy, but using (or at least learning) regular expressions is useful –  Jun 11 '17 at 20:09
  • @natanel97 I user regular expressions from time to time, but for this case, I want something that has passed the test of time. – TheProgrammer Jun 11 '17 at 20:19
  • @TheProgrammer - I'm here to moderate, and your question has been asked a thousand times since the dawn of the internet. no need to skip google and ask again. regarding the "answers", they are not appropriate answers since giving links is not answer but should be written as comments. there is no excuse for asking something you can get a 1000 answers to on Google. – vsync Jun 11 '17 at 20:48
  • @vsync Well, I made about 10 searches and only came across solutions that did not fit my needs. That's why I am here. Do you think I would have really taken the time to write a question on SO before doing searches ? – TheProgrammer Jun 11 '17 at 20:50
  • @vsync What I want is to remove all tags and only keep plain text, could you at least please tell me how to achieve that or point me in the right direction ? – TheProgrammer Jun 11 '17 at 20:54
  • @TheProgrammer tried [google](https://www.google.co.il/search?q=github+html+sanitiser&oq=github+html+sani&aqs=chrome.1.69i57j35i39j0j69i64.4910j0j7&sourceid=chrome&ie=UTF-8)? – vsync Jun 11 '17 at 21:17
  • If you can insert HTML through comments on your web site, it means that you have yet to learn some fundamental concepts of HTML and that you have bugs in your code. You should post the relevant parts of your code that processes comments, together with some sample input and output. You rarely need to sanitize input, unless you want to filter out spam, rude content or similar. – Nisse Engström Jun 11 '17 at 21:28

2 Answers2

13

Because JavaScript can be disabled, sanitation is not an operation for the frontend; this task should be performed on the backend. Best practice says...

  • Validate input (frontend)
    • Ensure that the data conforms to what you expect before submission
  • Sanitize input (backend)
    • Employ means on the backend to escape or remove unsafe characters before it reaches your application's storage layer
  • Escape output (backend)
    • As an additional safety measure, before outputting, be sure to escape anything coming from a 3rd party source

You are encouraged to validate data input on the frontend, notifying the user that certain characters are not permitted when trying to submit invalid data. In the event that JavaScript then gets disabled, your backend will still know what to with the malformed data.

Kenneth Stoddard
  • 1,409
  • 11
  • 13
  • This answer was pointed to from elsewhere, and I'm offering some thoughts here to push back on it a bit. Broadly, the worry here is that a miscreant user will inject HTML or JavaScript for malicious purposes. The guidance here is also only for where the application is _not_ accepting HTML (if it is then that is a different problem). With that in mind, let's say I am asked for a username in a registration page. If I want to be called – halfer Jul 27 '19 at 23:03
  • So, this can be stored in the database with absolutely no ill effect. Indeed, these very comments are a good illustration of doing it correctly - I can inject all manner of opening tags, but they are escaped into HTML entities in the output layer. That makes me wonder then if the last item is not ideal advice - escaping is essential in the output layer, and it is not a backstop. – halfer Jul 27 '19 at 23:04
  • Data should also be sanitized prior to reaching your application’s storage layer because of vulnerabilities like SQL Injection. It is not enough to ONLY escape on output. See https://www.w3schools.com/sql/sql_injection.asp. – Kenneth Stoddard Jul 28 '19 at 00:23
  • Ah, I see where we might be talking at cross-purposes, Kenneth - thanks for the clarification. SQL injection should be handled using parameter binding, not sanitisation - sanitisation is the removal of characters, and in general we don't want that. For example, the surname "O'Reilly" will confuse the database parser if injected into SQL - but we do not want to remove (sanitise) the apostrophe. – halfer Jul 28 '19 at 08:36
  • However, some very paranoid use-cases (e.g. banks) _will_ sanitise on input, even if it is "wrong" to do so. For example, a bank might tell me I can't have angle brackets in my username, and will strip them out. However, as Stack Overflow shows, you can have angle brackets in usernames very safely, because it's the output layer that causes the security issue. – halfer Jul 28 '19 at 08:39
  • For interest, here is [the other question](https://stackoverflow.com/q/57233660). – halfer Jul 28 '19 at 08:39
0

The NPM package sanitize-html is a package I've used before if you want a more thorough check.

etoxin
  • 4,908
  • 3
  • 38
  • 50
AHB
  • 548
  • 2
  • 7