- Single quote and double quote not defined in HTML 4.0
Single quote only is not defined in HTML 4.0, double quote is defined as "
starting HTML2.0
- StringEscapeUtils not able to escape these 2 characters into respective entities
escapeXml11
in StringEscapeUtils
supports converting single quote into '
.
For Example:
StringEscapeUtils.escapeXml11("'"); //Returns '
StringEscapeUtils.escapeHtml4("\""); //Returns "
- Is there any other String related tool able to do this?
HTMLUtils from Spring framework takes care of single quotes & double quotes, it also converts the values to decimal (like '
& "
).
Following example is taken from the answer to this question:
import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&")` //gives &
HtmlUtils.htmlEscape("&")` //gives &
- Any reason why single quote and double quote is not defined in HTML Entities 4.0?
As per Character entity references in HTML 4 the single quote is not defined. Double quote is available from HTML2.0. Whereas single quote is supported as part of XHTML1.0.
- Tool or method to encode all the unicode character into respective entities
There is a very good & simple java implementation mentioned as part of an answer to this question.
Following is a sample program based on that answer:
import org.apache.commons.lang3.StringEscapeUtils;
public class HTMLCharacterEscaper {
public static void main(String[] args) {
//With StringEscapeUtils
System.out.println("Using SEU: " + StringEscapeUtils.escapeHtml4("\" ¶"));
System.out.println("Using SEU: " + StringEscapeUtils.escapeXml11("'"));
//Single quote & double quote
System.out.println(escapeHTML("It's good"));
System.out.println(escapeHTML("\" Grit \""));
//Unicode characters
System.out.println(escapeHTML("This is copyright symbol ©"));
System.out.println(escapeHTML("Paragraph symbol ¶"));
System.out.println(escapeHTML("This is pound £"));
}
public static String escapeHTML(String s) {
StringBuilder out = new StringBuilder(Math.max(16, s.length()));
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c > 127 || c == '"' || c == '<' || c == '>' || c == '&' || c == '\'') {
out.append("&#");
out.append((int) c);
out.append(';');
} else {
out.append(c);
}
}
return out.toString();
}
}
Following are some interesting links, which i came across during the pursuit of the answer: