38

currently I use org.apache.commons.lang.StringEscapeUtils escapeHtml() to escape unwanted HTML tags in my Strings but then I realized it escapes characters with accents to &something;, too, which I don't want.

Do you know any solution for escaping HTML tags but leave my special (well, for some people, they are normal here ;]) letters as they are?

Thanks in advance!

balázs

jmj
  • 237,923
  • 42
  • 401
  • 438
Balázs Németh
  • 6,222
  • 9
  • 45
  • 60
  • `&something;` will be converted to `&something;` -- do you want character '&' not to be escaped? Most usual cases a user enters the symbol that `&something;` stands for, in UI. and escapeHTML just converts that special character to equivalent HTML entity. – Nishant Feb 02 '11 at 12:58
  • 1
    I mean á gets converted to á which I don't want. I don't want letters to be escaped at all...everything else, yes. – Balázs Németh Feb 02 '11 at 13:03
  • What do you need to escape HTML for? For JSP? – BalusC Feb 02 '11 at 13:06
  • Almost, JSF. Do you have any other idea how to prevent users using tags in comments? I have to enable
    though, that's why I have to use escape false in the output tags.
    – Balázs Németh Feb 02 '11 at 13:16
  • 2
    +50 bounty: Please try to give an answer closer to the original question, an escaping function wich will not hurt UTF-8 characters. – vbence May 16 '11 at 11:18

6 Answers6

32
StringUtils.replaceEach(str, new String[]{"&", "\"", "<", ">"}, new String[]{"&amp;", "&quot;", "&lt;", "&gt;"})
pingw33n
  • 12,292
  • 2
  • 37
  • 38
  • 13
    [OWASP](http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet) also recommends `'` and `/`. – axtavt Feb 02 '11 at 13:09
  • 1
    Which version of StringUtils is that? I have one in commons-lang-2.2 but no replaceEach method. Not critical though, that's actually easy to implement what you recommened. I would have like an out-of-box solution though :-/ – Balázs Németh Feb 02 '11 at 13:23
  • 3
    what about `® ¶ © ½ æ ÷ §` and the rest of the shebang found at http://arnspublishing.com/QuickRef/ISO8859.html ?? =) That replace each is a disater waiting to happen! – Piotr Kula May 19 '11 at 10:52
  • yeah but that's exactly what I did NOT want :) correct me if I'm wrong but I don't know any HTML tags like <§> :P – Balázs Németh Nov 29 '11 at 15:37
  • 1
    @ppumkin, please explain further. – Xonatron Feb 21 '12 at 17:44
  • @pingw33n, I have tried importing `org.springframework.util.StringUtils`, `org.apache.soap.util.StringUtils`, `org.apache.axis.utils.StringUtils`, and `com.ibm.wsdl.util.StringUtils`, and none of them have `StringUtils.replaceEach()`. What are you importing to have access to this method? They seem to have a `.replace()` however. – Xonatron Feb 21 '12 at 17:54
  • 2
    @MatthewDoucette it's `org.apache.commons.lang.StringUtils`: http://commons.apache.org/lang/api-2.5/org/apache/commons/lang/StringUtils.html – pingw33n Feb 22 '12 at 11:22
  • what if clients say that he wants < as < only? – linuxeasy Mar 08 '13 at 08:38
  • 1
    As @EtienneNeveu mentionned you MUST read http://wonko.com/post/html-escaping, it all depends on the context – Christophe Roussy May 12 '14 at 12:10
21

If it's for Android, use TextUtils.htmlEncode(String) instead.

goncalossilva
  • 1,830
  • 15
  • 25
9

This looks very good to me:

org/apache/commons/lang3/StringEscapeUtils.html#escapeXml(java.lang.String)

By asking XML, you will get XHTML, which is good HTML.

Alexander Farber
  • 21,519
  • 75
  • 241
  • 416
Nicolas Barbulesco
  • 1,789
  • 3
  • 15
  • 20
6

Here's a version that replaces the six significant characters as recommended by OWASP. This is suitable for HTML content elements like <textarea>...</textarea>, but not HTML attributes like <input value="..."> because the latter are often left unquoted.

StringUtils.replaceEach(text,
        new String[]{"&", "<", ">", "\"", "'", "/"},
        new String[]{"&amp;", "&lt;", "&gt;", "&quot;", "&#x27;", "&#x2F;"});
quietmint
  • 13,885
  • 6
  • 48
  • 73
  • Thanks! Adapted for another solution here: https://stackoverflow.com/a/75355474/3196753. I chose to declare the characters as `static final` for performance. I also replaced the hex markup with human-readable replacements. – tresf Feb 05 '23 at 21:02
6

I know is too late to adding my comment, but perhaps the following code will be helpful:

public static String escapeHtml(String string) {
    StringBuilder escapedTxt = new StringBuilder();
    for (int i = 0; i < string.length(); i++) {
        char tmp = string.charAt(i);
        switch (tmp) {
        case '<':
            escapedTxt.append("&lt;");
            break;
        case '>':
            escapedTxt.append("&gt;");
            break;
        case '&':
            escapedTxt.append("&amp;");
            break;
        case '"':
            escapedTxt.append("&quot;");
            break;
        case '\'':
            escapedTxt.append("&#x27;");
            break;
        case '/':
            escapedTxt.append("&#x2F;");
            break;
        default:
            escapedTxt.append(tmp);
        }
    }
    return escapedTxt.toString();
}

enjoy!

Community
  • 1
  • 1
Ahmad AlMughrabi
  • 1,612
  • 17
  • 28
  • 1
    You should use [StringBuilder](https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html). – peterh May 22 '16 at 09:11
0

If you're using Wicket, use:

import org.apache.wicket.util.string.Strings;
...
CharSequence cs = Strings.escapeMarkup(src);
String str =      Strings.escapeMarkup(src).toString();
andraaspar
  • 796
  • 6
  • 10