-4

What I am trying to get is very simple: I have a textbox in my website where i can use html and when i press the "Send" button, the value in that textbox is send to the website. But sometimes a user comes to the site, writes html and forgets to close the tag for example, messing up completly my site. In order to solve this problem, all i want is a validator that when the button is clicked, it checks whatever was written and inspects if something is incorrect and presents a solution to correct the html problem. I've seen plataforms like https://validator.w3.org/docs/api.html that do exactly what i want but can i use it in my own site?

I tried to do something in my own with Javascript but it's very complex and has a lot of issues.

Can please someone help me?

1 Answers1

0

First things first. I would be severely re-miss if I failed to point out that accepting raw HTML from your users is, generally-speaking, not a good idea™.

Doing this incorrectly (and it is an extremely difficult task to do correctly) leaves your site, and your users, open to many vulnerabilities. You can view a partial list of them at https://html5sec.org/ (I say partial because they're only listing the "known" attack vectors). There are a lot of good answers to a seemingly-unrelated, but definitely semi-related question and I strongly recommend that you read them all.

"But @Pete!", I hear you cry, "My users are trustworthy. They won't try to click-jack my other users, or do anything else malicious or untowards!"

You may be suffering under the delusion that everyone who uses your site will not be malicious, or will even be using a browser to submit HTML to your site (so don't forget server-side validation and sanitization).

Then again, you may not be deluded and your userbase has a vested interest in only submitting safe HTML for your site. Maybe you've already considered, and implemented, bullet-proof client-side and server-side validation and sanitization routines. I don't know your exact circumstances and I won't pretend to (although I do know the probabilities involved here are not in your favor).

With all of the above in mind, if you still insist on allowing a user to write and submit raw HTML to your site, consider:

  • using the documentation found at https://validator.w3.org/docs/api.html to fire off an AJAX request and validate the HTML being submitted;
  • using a plugin/library for a Rich Text Editor that lets the user enter in formatted text like they would in a word processor and gives you a resulting HTML string to send to your server.
  • using a plugin/library for a Markdown parser (like the one you use here at SO).

You could also just convert the user's HTML to a DOM element (allowing the browser to parse the HTML into an actual DOM element) and then grab the [parsed] HTML string back:

window.addEventListener('load', function () {
    var textarea = document.getElementById('unsafe-html');
    var button = document.getElementById('get-unsafe-html');
    var getUnsafeHtml = function getUnsafeHtml() {
        var div = document.createElement('div');
        div.innerHTML = textarea.value; // parses HTML to DOM elements
        return div.innerHTML; // gets it back in a string form.
    }
    
    
    button.addEventListener('click', function (e) {
        var unsafeHtml = getUnsafeHtml();
        console.log(unsafeHtml);
        
        e.preventDefault();
        return false;
    }, false);
}, false);
<textarea id="unsafe-html" rows="5">
    <p>If you <strong>insist</strong>, <i>then</i> this technique can be used as well.</p>
</textarea>
<button id="get-unsafe-html">Get the HTML</button>

This will not ensure that the markup is the way the user intended it to be, but it will ensure you don't have unmatched tags (as they will be either auto-closed, or removed, depending on the browser).

pete
  • 24,141
  • 4
  • 37
  • 51