1

I have a problem that I want a regex pattern that allows every character, for example, alphabets , digits, special characters, umlaut characters but not invalid ones like an arrow sign which prevents me from generating a xml. Please help me with this and how can I replace that invalid character with a white space.?

I am using java 1.5

akshat
  • 17
  • 1
  • 6
  • Disallowing invalid characters is usually a bad way to go. Replacing them with their entities seems much more reasonable (see [here](http://stackoverflow.com/questions/439298/best-way-to-encode-text-data-for-xml-in-java) for an answer on why/how). – Wrikken Dec 11 '12 at 19:11
  • 1
    Try looking at this answer & other answers listed here http://stackoverflow.com/a/5008282/586621 – Jay Dec 11 '12 at 19:17

3 Answers3

2

You can just use a character class and match all the valid characters

^[a-zA-Z\d]+$

But if you want dont want to allow certain characters you can use Negated character class

^[^><]+$
   --
    |->your arrow signs go here...

For example,regex like [^a-zA-Z] would match any character except a letter

Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • How to avoid a symbol like a right arrow or left arrow or any such symbol '→', as we dont know what all these types of symbols can be, I just want to allow valid characters like alphanumeric , special characters etc. – akshat Dec 11 '12 at 19:20
  • @you can use `^[!-~\s]+$`..this will allow alpha-numeric and special characters..`!-~` means i am taking all characters within range `!` and `~`..refer to ascii chart for this...`\s` matches newline,space..etc.. – Anirudha Dec 11 '12 at 19:25
0

The replaceAll function will take a RegEx:

String myUnparsedString = "<some-xml-style-node>";
String myParsedString = myUnparsedString.replaceAll("<", " ");
anotherdave
  • 6,656
  • 4
  • 34
  • 65
  • i dont think thats what he's asking. he want to add text to xml, but the text has invalid xml chars that require escaping – Bohemian Dec 11 '12 at 19:08
  • Ah right… I read his question quickly, just thought he wanted something that would remove angle brackets so that the content could be put within XML tags; question could do with tidying TBH :) – anotherdave Dec 11 '12 at 19:10
  • i dunno now. maybe your answer is what he wants. I didn't realise that "arrow sign" might mean `<`. – Bohemian Dec 11 '12 at 19:14
  • My arrow sign is not made, it is like an arrow symbol in MS Word '→', theses type of symbols prevents me from creating xml. – akshat Dec 11 '12 at 19:18
0

You can probably do to remove all unwanted characters like arrow:

String cleanXML = xml.replaceAll("[^\u0000-\u00ff]+", " ");

Which is simply replacing 1 or more characters outside \x00-\xFF range with space.

anubhava
  • 761,203
  • 64
  • 569
  • 643