1

I want to remove all Unicode Characters and Escape Characters like (\n, \t) etc. In short I want just alphanumeric string.

For example :

\u2029My Actual String\u2029 \nMy Actual String\n

I want to fetch just 'My Actual String'. Is there any way to do so, either by using a built in string method or a Regular Expression ?

Chris
  • 8,268
  • 3
  • 33
  • 46
Bilal Ahmed Yaseen
  • 2,506
  • 2
  • 23
  • 48
  • 2
    This was asked 5 mins ago, what are the odds ;) http://stackoverflow.com/questions/20678238/converting-unicode-to-string-java – Peter Lawrey Dec 19 '13 at 09:55
  • Look here. http://stackoverflow.com/a/20654784/2968614 – Aditya Dec 19 '13 at 09:57
  • That was just for '/n' but I want for both Unicode and escape characters Actually I am done but in java '/' is replaced by '//' that's why my RE or Function is not working. – Bilal Ahmed Yaseen Dec 19 '13 at 10:00
  • 2
    To remove all Unicode characters from a string, you just need to remove *everything* from the string. Simple as that. – Joey Dec 19 '13 at 10:05

2 Answers2

0

Try this:

anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.", "");

to remove escaped characters. If you also want to remove all other special characters use this one:

anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.|[^a-zA-Z0-9\\s]", "");

(I guess you want to keep the whitespaces, if not remove \\s from the one above)

grexter89
  • 1,091
  • 10
  • 23
0

Try

String  stg = "\u2029My Actual String\u2029 \nMy Actual String";
Pattern pat = Pattern.compile("(?!(\\\\(u|U)\\w{4}|\\s))(\\w)+");
Matcher mat = pat.matcher(stg);
String out  =  "";
while(mat.find()){
    out+=mat.group()+" ";   
}
System.out.println(out);

The regex matches all things except unicode and escape characters. The regex pictorially represented as:

enter image description here

Output:

My Actual String My Actual String
Rakesh KR
  • 6,357
  • 5
  • 40
  • 55
  • how '\n' or '\t' runs in this flow ? – Bilal Ahmed Yaseen Dec 19 '13 at 11:18
  • `\s` stands for "whitespace character". Again, which characters this actually includes, depends on the regex flavor. [ \t\r\n\f]. That is: `\s` matches a space, a tab, a line break, or a form feed. – Rakesh KR Dec 19 '13 at 11:36
  • What if I just want to remove all these characters just from the start of string for example \u2029 \\t\\t&*^ my Actual String ==> my Actual String ??? – Bilal Ahmed Yaseen Dec 26 '13 at 18:02