0

สวัสดี Mr.Java Sp'e c'i'a'l'' '

I tried to parse the String using below code but I could't make simply it shows the wrong value.

String s = "สวัสดี Mr.Java Sp'e c'i'a'l'' '"";
s = s.replaceAll("'", "'");
//s = s.replaceAll("'", "''");
StringEscapeUtils.escapeHtml(s);

I am trying to get from JSP and save in SQL Server DB and show using JSP and update. But some times in JSP it shows the converted &apos in jsp as it is instead of Special Chars.

Very Simple is Here I have shown this String(สวัสดี Mr.Java Sp'e c'i'a'l'' ') in StackOverflow they save in their DB and Shows and allows me to update this is what I wanted.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
sunleo
  • 10,589
  • 35
  • 116
  • 196
  • Is it a type or is your string exactly that? Because it's not a valid string, it has a " left alone. You have to remove or escape it. – Djon Jun 04 '13 at 14:22
  • It is just for example shown, TYPO, but in Text box of JSP ,this also will be allowed to enter as remark. – sunleo Jun 04 '13 at 14:23
  • might be it's related to UTF format. might be your db not stored/supported utf format!!. – bNd Jun 04 '13 at 14:41
  • no problem wit DB it supports UTF-8 chars. – sunleo Jun 04 '13 at 14:42

1 Answers1

1

OK. So lets look at what your code does:

// line 1
String s = "สวัสดี Mr.Java Sp'e c'i'a'l'' '";

We have a String with various international characters in it ... and some "'" characters.

// line 2 
s = s.replaceAll("'", "'");

Assuming that those are really "'" characters characters, we will replace all instances of "'" with an XML / HTML character entity giving us:

"สวัสดี Mr.Java Sp'e c'i'a'l'' '"

And so ...

// line 3
s = StringEscapeUtils.escapeHtml(s);

This replaces any active HTML / XML characters with character references. This includes the ampersand characters "&" that you previously inserted. The result is this:

"&#xxxx;&#xxxx;&#xxxx;&#xxxx; Mr.Java Sp'e
 c'i'a'l'' '"

(The &#xxxx; numeric character references encode those Thai (?) characters.)

When you embed that in an HTML document and display it, you will see "สวัสดี Mr.Java Sp'e c'i'a'l'' '"


See what has happened? You have HTML escaped your HTML escaped apostrophies!!


So what do you really need to do?

  1. There is no need replace apostrophes with '. Apostrophes are legal in HTML text.

  2. There should be no need to add HTML escapes so that you can store text in a database:

    • Any modern database will allow you to store Unicode strings without any special encoding.

    • If you are trying to prevent the database's SQL parser getting confused by quotes in the text you are storing, you are doing it the wrong way. The right way to do this is to use a PreparedStatement, add parameter placeholders to the query, and use the PreparedStatement.setXxx methods to provide the parameter values. The execute (or whatever) will take care of any SQL escaping that needs to be done.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thanks for your reply.but what happens if I do this without parse in SQL Server I couldn't insert ' and while show this in jsp text box it become broken text box due to ' . – sunleo Jun 04 '13 at 15:11
  • 2
    As to the `'` problem in SQL (i.e. SQL injection), this is covered by properly using `PreparedStatement`. If you stil have problems with `'` in SQL, then you're apprantely not properly using `PreparedStatement`. As to HTML problem in JSP text box in JSP (i.e. XSS attack), see http://stackoverflow.com/questions/4948532/where-should-i-escape-html-strings-jsp-page-or-servlets/4948856#4948856. Summarized: do **not** escape them before saving in DB. This makes no utter sense as indicated by Stephen's answer. – BalusC Jun 04 '13 at 18:00