0

I'm using com.adobe.granite.xss for encoding strings in JSP. It seems to work with most characters, except for Ã. à is displayed as Ã�.

It happens when using xssAPI.encodeForHTML() method. I have tried <cq:text> with escapeXml="true" and it has the same behaviour.

The characters are stored properly in the repository and i have also set content="text/html; charset=utf-8" in the JSP.

Is there a way to encode or filter the input for XSS without the charset breaking in such situations.

I have tried it with different non-latin characters and most of them are not affected by XSS api.

enter image description here

Sharath Madappa
  • 3,393
  • 1
  • 24
  • 41
  • 1
    Character  appears to have the same problem. Since e.g. à = U+00C3 which is 0xC3 0x83 in UTF-8, it seems that this part of the data is UTF-8 encoded data that has got its bytes misinterpreted as ISO-8859-1 data (and “�” is perhaps an indication of the fact that 0x83 is assigned to a control code in ISO-8859-1). – Jukka K. Korpela Nov 14 '14 at 08:35

1 Answers1

2

It looks like it's an issue of owasp-esapi-java which is used in CQ's XSSAPI, because it's iterating through string using a charAt() method. But à is outside of BMP so, right way of iterating would be:

final int length = s.length();
for (int offset = 0; offset < length; ) {
   final int codepoint = s.codePointAt(offset);

   // do something with the codepoint

   offset += Character.charCount(codepoint);
}

(form How can I iterate through the unicode codepoints of a Java String?)

So I think that it's an issue of this library.

Try to use xssAPI.filterHTML(), probably it can solve your issue.

Community
  • 1
  • 1
Oleksandr Tarasenko
  • 1,454
  • 15
  • 21