1

Hey I want to sanitize a string and only allow it to have a-z A-Z (also other languates, not only english) and , I tried doing the ReplaceAll([^a-z 0-9,]) but it is deleting other languates.. can someone show me how can I manage to only sanitize special characters and also won't delete emojis from it?

user7415791
  • 449
  • 1
  • 7
  • 11

3 Answers3

1

You could try getting the a-z and 0-9 characters' ASCII code, and if the current character is not one of them, do what you wish. On how to get the ascii value of a character, refer here.

EDIT: the idea is that a-z and 0-9 the characters are next to each other. So just write a simple function that returns a boolean whether your current character is one of these, and if not, replace. For this though, you will have to replace one by one.

Community
  • 1
  • 1
agiro
  • 2,018
  • 2
  • 30
  • 62
1

I've tested this regular expression and AFAIK it works...

String result = yourString.replaceAll("[^a-zA-Z0-9]", "");

It replaces any character that isn't in the set a-z, A-Z, or 0-9 with nothing.

Charlie
  • 2,876
  • 19
  • 26
0

In java you can do

yourString.replaceAll("[^\\p{L}\\p{Nd}]+", "");

The regular expression [^\p{L}\p{Nd}]+ match all characters that are no a unicode letter or a decimal number.

If you need only characters (not numbers) you can use the regular expression [^\\p{L}]+ as follow:

yourString.replaceAll("[^\\p{L}]+", "");
Davide Lorenzo MARINO
  • 26,420
  • 4
  • 39
  • 56