65

Assume I have a page with an input box. The user types something into the input box and hits a button. The button triggers a function that picks up the value typed into the text box and outputs it onto the page beneath the text box for whatever reason.

Now this has been disturbingly difficult to find a definitive answer on or I wouldn't be asking but how would you go about outputting this string:

<script>alert("hello")</script> <h1> Hello World </h1>

So that neither the script is executed nor the HTML element is displayed?

What I'm really asking here is if there is a standard method of avoiding both HTML and Script injection in Javascript. Everyone seems to have a different way of doing it (I'm using jQuery so I know I can simply output the string to the text element rather than the html element for instance, that's not the point though).

lorless
  • 4,126
  • 8
  • 30
  • 41
  • 1
    possible duplicate of [How to prevent Javascript injection attacks within user-generated HTML](http://stackoverflow.com/questions/942011/how-to-prevent-javascript-injection-attacks-within-user-generated-html) – JJJ Dec 31 '13 at 10:04
  • 3
    Do you want to block _all_ HTML injection, or just _unsafe_ ones? – Barmar Dec 31 '13 at 10:05
  • 1
    Also, if the use case is really what you say and this is client-side JavaScript only, you really don't need to prevent "injection". The user can only attack himself if the input isn't shown to anyone else (and if it's shown to other users you'd sanitize the input server-side). – JJJ Dec 31 '13 at 10:07
  • 1
    All, this is more about an explanation of the concept and methods to prevent these kind of things happening. And what constitutes an unsafe html injection as opposed to a safe one? – lorless Dec 31 '13 at 10:09
  • `

    Hello World

    ` is a safe injection because it doesn't present a security risk to the user. If you want to prevent HTML/JS injection, you either remove on encode HTML tags. It's simple as that.
    – JJJ Dec 31 '13 at 10:12
  • @Juhana okay, but say that this is going to be shown to other people. Is there no in built way to sanitise the users input and return it to the page in Javascript? Again this is more theoretical than anything else. It could be that I am simply missing the accepted practices involved here. – lorless Dec 31 '13 at 10:12
  • jQuery's `.text()` is a common practice, assuming the data is coming from an Ajax call or something (if it's embedded in the HTML document it's already too late). – JJJ Dec 31 '13 at 10:14

8 Answers8

93

You can encode the < and > to their HTML equivelant.

html = html.replace(/</g, "&lt;").replace(/>/g, "&gt;");

How to display HTML tags as plain text

Community
  • 1
  • 1
TastySpaceApple
  • 3,115
  • 1
  • 18
  • 27
  • Ace! Exactly what I was looking for, apologies if it was unclear. This is a regular expression then? I have't particularly used them apart from slight modifications to others code. – lorless Dec 31 '13 at 10:32
  • 1
    It's a very simple regular expression - the expression is basically `<`. in javascript a regular expression is defined inside slashes (`//`) and the `g` means it should search for all occurrences (g = global). – TastySpaceApple Dec 31 '13 at 10:41
  • 3
    @dlampard Note that this is exactly what `.text()` does, so if you have jQuery there's no need for a custom regex. – JJJ Dec 31 '13 at 11:00
  • 1
    Okay, I'm trying to avoid using jQuery because it obscures what is actually going on while im trying to understand the concepts and basics. I do love it though. – lorless Dec 31 '13 at 11:09
  • 2
    Any idea on what to do if you'd like to allow only some tags, such as bold or italics? – sab669 Sep 10 '15 at 13:30
  • @sab669 did u find the answer on how to allow only some tags? – Faizan May 13 '16 at 19:48
  • 1
    This will only prevent – simon Feb 19 '19 at 22:30
  • The only safe option for external html I have found so far, if you are using templates, the framework you are using should have some integrated safe function to do this, go look for that. – Dan Apr 23 '19 at 01:18
12
myDiv.textContent = arbitraryHtmlString 

as @Dan pointed out, do not use innerHTML, even in nodes you don't append to the document because deffered callbacks and scripts are always executed. You can check this https://gomakethings.com/preventing-cross-site-scripting-attacks-when-using-innerhtml-in-vanilla-javascript/ for more info.

BiAiB
  • 12,932
  • 10
  • 43
  • 63
  • 1
    note that the ask didn't include a `jquery` tag, so you should try and answer in plain javascript. – TastySpaceApple Dec 31 '13 at 10:17
  • I'll elaborate and provide a jquery-less answer if the OP asks for it. The principle would be the same here. – BiAiB Dec 31 '13 at 10:18
  • I'm not entirely sure why but if the value of html is 0 this seems to return an empty string. – Rikaelus Jan 29 '17 at 05:30
  • You must pass it a string, so `"0"` would work. you can check the source https://github.com/jquery/jquery/blob/2.2-stable/src/core/parseHTML.js and see it discards non string inputs – BiAiB Jan 30 '17 at 14:07
  • Question: Are you sure that `temp.innerHTML = arbitraryHtmlString;` won't start pre-loading images and running any `onload` handlers defined in image tags and the like? –  Nov 28 '17 at 12:16
  • Yes, like, 93% sure. – BiAiB Nov 28 '17 at 17:44
  • Both of these solutions are unsafe, can be tested easily with html like: '' – Dan Apr 23 '19 at 01:24
  • @Dan damn, you're right, i'll change the solution using textContent – BiAiB Apr 23 '19 at 07:55
3

A one-liner:

var encodedMsg = $('<div />').text(message).html();

See it work:

https://jsfiddle.net/TimothyKanski/wnt8o12j/

Timothy Kanski
  • 1,861
  • 14
  • 20
1

From here

var string="<script>...</script>";
string=encodeURIComponent(string); // %3Cscript%3E...%3C/script%3
hestellezg
  • 3,309
  • 3
  • 33
  • 37
0

I use this function htmlentities($string):

$msg = "<script>alert("hello")</script> <h1> Hello World </h1>"
$msg = htmlentities($msg);
echo $msg;
0

My solution using typescript + decorators + regex

const removeTag = new RegExp("(<[a-zA-Z0-9]+>)|(</[a-zA-Z0-9]+>)", "g");
return value.replace(removeTag, "");

"use strict";
var __decorate = (this && this.__decorate) || function (decorators, target, key, desc) {
    var c = arguments.length, r = c < 3 ? target : desc === null ? desc = Object.getOwnPropertyDescriptor(target, key) : desc, d;
    if (typeof Reflect === "object" && typeof Reflect.decorate === "function") r = Reflect.decorate(decorators, target, key, desc);
    else for (var i = decorators.length - 1; i >= 0; i--) if (d = decorators[i]) r = (c < 3 ? d(r) : c > 3 ? d(target, key, r) : d(target, key)) || r;
    return c > 3 && r && Object.defineProperty(target, key, r), r;
};
var __metadata = (this && this.__metadata) || function (k, v) {
    if (typeof Reflect === "object" && typeof Reflect.metadata === "function") return Reflect.metadata(k, v);
};
function filter(target) {
    return class extends target {
        constructor(...args) {
            super(...args);
        }
        setState(opts) {
            const state = {
                username: this.filter(opts.username),
                password: this.filter(opts.password),
            };
            super.setState(state);
        }
        filter(value) {
            const removeTag = new RegExp("(<[a-zA-Z0-9]+>)|(</[a-zA-Z0-9]+>)", "g");
            return value.replace(removeTag, "");
        }
    };
}
let Form = class Form {
    constructor() {
        this.state = {
            username: "",
            password: "",
        };
    }
    setState(opts) {
        this.state = {
            ...this.state,
            ...opts,
        };
    }
    getState() {
        return this.state;
    }
};
Form = __decorate([
    filter,
    __metadata("design:paramtypes", [])
], Form);
function getElement(key) {
    return document.getElementById(key);
}
const button = getElement("btn");
const username = getElement("username");
const password = getElement("password");
const usernameOutput = getElement("username-output");
const passwordOutput = getElement("password-output");
function handleClick() {
    const form = new Form();
    form.setState({ username: username.value, password: password.value });
    usernameOutput.innerHTML = `Username: ${form.getState().username}`;
    passwordOutput.innerHTML = `Password: ${form.getState().password}`;
}
button.onclick = handleClick;
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <style>
      :root {
        --bg: #1d1907;
        --foreground: #e3e0cd;
        --primary: #cfb53b;
        --black: #333;
        --white: #fafafa;
      }

      @keyframes borderColor {
        from {
          border-bottom: 1px solid var(--foreground);
        }

        to {
          border-bottom: 1px solid var(--primary);
        }
      }

      * {
        outline: none;
        border: none;
      }

      body {
        padding: 0.5rem;
        font-family: "Fira Code";
        background-color: var(--bg);
        color: var(--foreground);
      }

      input {
        border-bottom: 1px solid var(--foreground);
        background-color: var(--black);
        color: var(--foreground);
        padding: 0.5rem;
      }

      input:focus {
        animation-name: borderColor;
        animation-duration: 3s;
        animation-fill-mode: forwards;
      }

      button {
        padding: 0.5rem;
        border-radius: 3px;
        border: 1px solid var(--primary);
        background-color: var(--primary);
        color: var(--white);
      }

      button:hover,
      button:active {
        background-color: var(--white);
        color: var(--primary);
      }

      .form {
        margin-bottom: 2rem;
      }
    </style>
    <title>Decorator</title>
  </head>
  <body>
    <h1>Prevent Injection</h1>
    <div class="form">
      <div class="form-group">
        <label for="username">Username</label>
        <input type="text" id="username" placeholder="Type your username" />
      </div>
      <div class="form-group">
        <label for="password">Password</label>
        <input type="password" id="password" placeholder="Type your password" />
      </div>
      <div class="form-group">
        <button id="btn">Enviar</button>
      </div>
    </div>
    <div class="form-result">
      <p id="username-output">Username:</p>
      <p id="password-output">Password:</p>
    </div>
    <script src="/dist/pratica1.js"></script>
  </body>
</html>

Typescript Code bellow:

    type State = {
  username: string;
  password: string;
};

function filter<T extends new (...args: any[]) => any>(target: T): T {
  return class extends target {
    constructor(...args: any[]) {
      super(...args);
    }

    setState(opts: State) {
      const state = {
        username: this.filter(opts.username),
        password: this.filter(opts.password),
      };
      super.setState(state);
    }

    filter(value: string) {
      const removeTag = new RegExp("(<[a-zA-Z0-9]+>)|(</[a-zA-Z0-9]+>)", "g");
      return value.replace(removeTag, "");
    }
  };
}

@filter
class Form {
  private state: State;

  constructor() {
    this.state = {
      username: "",
      password: "",
    };
  }

  setState(opts: State) {
    this.state = {
      ...this.state,
      ...opts,
    };
  }

  getState() {
    return this.state;
  }
}

function getElement(key: string): HTMLElement | null {
  return document.getElementById(key);
}

const button = getElement("btn") as HTMLButtonElement;
const username = getElement("username") as HTMLInputElement;
const password = getElement("password") as HTMLInputElement;
const usernameOutput = getElement("username-output") as HTMLParagraphElement;
const passwordOutput = getElement("password-output") as HTMLParagraphElement;

function handleClick() {
  const form = new Form();
  form.setState({ username: username.value, password: password.value });
  usernameOutput.innerHTML = `Username: ${form.getState().username}`;
  passwordOutput.innerHTML = `Password: ${form.getState().password}`;
}

button.onclick = handleClick;
Max
  • 393
  • 4
  • 4
-1

Try this method to convert a 'string that could potentially contain html code' to 'text format':

$msg = "<div></div>";
$safe_msg = htmlspecialchars($msg, ENT_QUOTES);
echo $safe_msg;

Hope this helps!

Billy Bob
  • 27
  • 4
-1

Use this,

function restrict(elem){
  var tf = _(elem);
  var rx = new RegExp;
  if(elem == "email"){
       rx = /[ '"]/gi;
  }else if(elem == "search" || elem == "comment"){
    rx = /[^a-z 0-9.,?]/gi;
  }else{
      rx =  /[^a-z0-9]/gi;
  }
  tf.value = tf.value.replace(rx , "" );
}

On the backend, for java , Try using StringUtils class or a custom script.

public static String HTMLEncode(String aTagFragment) {
        final StringBuffer result = new StringBuffer();
        final StringCharacterIterator iterator = new
                StringCharacterIterator(aTagFragment);
        char character = iterator.current();
        while (character != StringCharacterIterator.DONE )
        {
            if (character == '<')
                result.append("&lt;");
            else if (character == '>')
                result.append("&gt;");
            else if (character == '\"')
                result.append("&quot;");
            else if (character == '\'')
                result.append("&#039;");
            else if (character == '\\')
                result.append("&#092;");
            else if (character == '&')
                result.append("&amp;");
            else {
            //the char is not a special one
            //add it to the result as is
                result.append(character);
            }
            character = iterator.next();
        }
        return result.toString();
    }