1

I've trouble parsing tweets which are represented as escaped unicode some found to be foreign language strings e.g \u064a\u0633\u0639\u062f\u0646\u064a

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176

2 Answers2

1

Using org.apache.commons.lang.StringEscapeUtils.

String s="\\u0048\\u0065\\u006C\\u006C\\u006F";
System.out.println(StringEscapeUtils.unescapeJava(s));

P.S. Oops, I didn't refresh this page before I post the answer, the comments above conveys the same thing.

Judking
  • 6,111
  • 11
  • 55
  • 84
0

you can try str = org.apache.commons.lang.StringEscapeUtils.unescapeJava(str);

from apache commons

check http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/StringEscapeUtils.html

Lakshmi
  • 2,204
  • 3
  • 29
  • 49