-1

I have a XML file which contains values having unwanted characters like

\xc2d
d\xa0
\xe7
\xc3\ufffdd
\xc3\ufffdd
\xc2\xa0
\xc3\xa7
\xa0\xa0
'619d813\xa03697'
\xe9.com

input examples could be

name : John Hinners\xc2d
email: abc@gmail\xe9.com
and others ....  

desired output should be

name : John Hinners
email: abc@gmail.com
and others ....  

I come from python background where this task can be done easily as

def remove_non_ascii(s):
    return ''.join(i for i in s if ord(i)<128)  

Is there some similar way to perform the same task in Java?

daydreamer
  • 87,243
  • 191
  • 450
  • 722
  • possible duplicate of [Java removing unicode characters](http://stackoverflow.com/questions/11020893/java-removing-unicode-characters) – Philipp Reichart Jun 18 '12 at 16:30

3 Answers3

2

As I said here:

Similar Question

Use regex

String clean = str.replaceAll("\\P{Print}", "");

Removes all non printable characters. But that also includes \n (line feed), \t(tab) and \r(carriage return), and if you want to keep those characters use:

String clean = str.replaceAll("[^\\n\\r\\t\\p{Print}]", "");
Community
  • 1
  • 1
Ivan Pavić
  • 528
  • 4
  • 22
1

In java it will not be as pretty.

You can use a regexp but if you don't have a simple definition of your characters the best is probably to do this :

        StringBuilder sb = new StringBuilder();
        for (int i=0; i<s.length(); i++) {
           if (((int)s.charAt(i))<128) sb.append(s.charAt(i));
        }
Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
  • 1
    Note that you don't have to cast between char and int in java; they are interchangeable. – EvenLisle Jun 18 '12 at 16:36
  • Yes, that's true. I always feel the intent is more clear with the casting but it may be stupid (or due to the fact that I don't change habits when changing language). – Denys Séguret Jun 18 '12 at 16:37
0
String s = "WantedCharactersunwantedCharacters";

If I want the remaining String to be "WantedCharacters", I simply write:

s = s.replaceAll("unwantedCharacters", "");

[EDIT]: You could, of course, also write

private static String removeNonAscii(String s){
    StringBuffer sb = new StringBuffer();
    for(int i=0; i<s.length(); ++i){
        if(s.charAt(i) < 128){
            sb.append(s.charAt(i));
        }
    }
    return sb.toString();
}

if that's a satisfying solution

EvenLisle
  • 4,672
  • 3
  • 24
  • 47
  • i had big problems with replaceAll ... its not working like expected ... "replaceAll("\\");" and something like that.... – headgrowe Jun 18 '12 at 16:32