1

Possible Duplicate:
How to escape text for regular expression in Java

I have a problem where my users have potty mouths....

To elaborate, my Android application uses Google Voice Search to return voice results and if the user has applied the setting to 'Block offensive words' it will return 'go away' as 'g* a***'

When trying to establish what the user has said, I will often use common matching such as:

if(voiceResult.matches(someCommand)) { //do something

If the user has chosen to speak an obscenity, then I will get the following error:

java.util.regex.PatternSyntaxException: Syntax error in regexp pattern near index X

I can't really request that all my users either don't swear or turn off the filter, especially as from my tests Google Voice Search seems to have a dirty mind and often returns swear words in the middle of the most random sentences!

So, I'm a little lost with how to deal with this eventuality... I've looked for a way to 'ignore regex' within a string, but I drew a blank and I can't figure out how I would dynamically escape any occurrences of * contained within the string...

At present, my only option seems to detect '*' and then ask them nicely not to swear or to remove the filter!

Suggestions welcome! Unless you think they deserve a force close for their bad manners...

Please Note: 'go away' is not currently filtered - it was an example....

EDIT: The most simple example regex where I confirm a repeat voice request:

String userWords = "g* a***"

if(userWords.matches(userWords)) { // Then go on to compare userWords with other strings

EDIT2:

    String goAway = "g* a***";

    String goAway1 = Pattern.quote(goAway);
    String goAway2 = Pattern.quote(goAway);

    if (goAway1.matches(goAway2)) { \\ do something
Community
  • 1
  • 1
brandall
  • 6,094
  • 4
  • 49
  • 103
  • I have thousands of .matches & .contains so rewriting my code for the answer there is far from ideal... – brandall Nov 27 '12 at 15:09
  • Well, what do you expect a solution to look like, if you have the problem in thousands of places in your code and you need to fix it in every single one of them? – Martin Ender Nov 27 '12 at 15:11
  • Could you post the specific regex which is causing the problem? – NemesisX00 Nov 27 '12 at 15:11
  • That's why I asked the question...... I'll edit for specific regex – brandall Nov 27 '12 at 15:13
  • 1
    @andjav you are using a method that takes a regex as the argument (see the [documentation](http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#matches(java.lang.String)). There is no global switch that says "no regex, please". So you will either have to use a different method (which would still require you to change every single usage in your code) or escape your strings so that they can be used as a regex. – Martin Ender Nov 27 '12 at 15:16
  • Ok, thank you. Is my only solution to detect if(myWords.contains(\\*)) { \\ deal with it? Edit: This text didn't allow for the double escape – brandall Nov 27 '12 at 15:19
  • Won't you have to do that just as often, as calling `Pattern.quote()`? – Martin Ender Nov 27 '12 at 15:22
  • If I can detect it in the first instance of the voice results containing '*' I can prevent it from being further matched. I wanted to know if I could somehow 'escape' the whole string before allowing it further. – brandall Nov 27 '12 at 15:27
  • @andjav that is exactly what `Pattern.quote()` does, isn't it? – Martin Ender Nov 27 '12 at 15:29
  • Unless I'm missing something I would have to have a different pattern for each possible swear word and which letters would be starred out? Unless I can use Pattern.quote(".*\\*+.*") or something similar? – brandall Nov 27 '12 at 15:37
  • 1
    @andjav you do `myEscapedWords = Pattern.quote(myWords);` and it will escape all regex meta-characters, allowing for `myEscapedWords` to be used as a regex that matches the literal string `myWords`. – Martin Ender Nov 27 '12 at 15:40
  • Thank you, sorry I wasn't understanding before. I'll test this now. – brandall Nov 27 '12 at 15:43
  • @andjav when you are replying to someone in a comment, please start your comment with `@username` (you will even get a nice auto-completion you can use with `Tab`). That will notify the other user that you have commented (I only saw your comments by accident because I occasionally checked up on the question). – Martin Ender Nov 27 '12 at 15:45
  • @m.buettner I've updated my question above. I would expect it to match on that, but it doesn't? Sorry if I'm further confused... – brandall Nov 27 '12 at 15:53
  • @andjav you are not supposed to `quote` your subject string, only the search string. Try `if (goAway.matches(goAway1)) {` – Martin Ender Nov 27 '12 at 15:59
  • @m.buettner that works, thank you. A little confused why my edit didn't match!? But hey, I'll move on. If you want to put the above into an answer, I'll mark it as correct. Thank you for your help and patience. – brandall Nov 27 '12 at 16:05

1 Answers1

1

You can use Pattern.quote() to do the escaping for you, as found here.

String pattern = Pattern.quote("g* a***");

Will give you the following string:

"g\* a\*\*\*"

Note that those backslashes are actual characters in the string. If you wanted to create this string manually, you would use this assignment:

String pattern = "g\\* a\\*\\*\\*";

Now you can use goAway1 as a regex pattern that literally matches g* away*** (because every single character is treated as a literal). So, for instance:

String goAway = "g* a***";
String pattern = Pattern.quote("g* a***");
if (goAway.matches(pattern)) { // we know that goAway was "g* a***"

Of course, you cannot use the pattern to match a quoted string (like you did in your edited code snippet). What you are trying to do is the same as applying the regex

String pattern = "g\\* a\\*\\*\\*";

to this literal subject string:

String subject = "g\\* a\\*\\*\\*";

What happens? g in the pattern, matches g in the subject. Now the pattern contains an escape sequence \* which will match a literal *. But the subject string has a literal \ next. And this fails to match.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130