2

I want to remove any occurence of "is happy" sentence from a very large text ignoring case sensitivity. Here are some of that large text sentences :

  1. "She is happy. I like that."

  2. "His happy son"

  3. "He is happy all the day"

  4. "Tasha is Happy"

  5. "Choose one of the following: is sad-is happy-is crying"

My initial code is :

String largeText = "....";  // The very large text here.
String removeText = "is happy";
largeText = largeText.replaceAll( "(?i)" + removeText , "" ); 

This code will work fine with sentence number 1, 3, 4, 5. But i do not want to delete it from sentence number 2 as it has another meaning. How can i do that ?

Brad
  • 4,457
  • 10
  • 56
  • 93
  • you'll need to be more specific about when you dont want to replace, just in this exact sentance or in all sentences of a particular form? can you write some rules about when you should and should not match? if so can you write those rules in code? – GreyCloud Dec 23 '10 at 20:23

2 Answers2

4

Use \b around your pattern to detect word boundaries. ie:

String largeText = "....";  // The very large text here.
String removeText = "is happy";
largeText = largeText.replaceAll( "(?i)\\b" + removeText + "\\b" , "" ); 
Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • .. That works fine. Just a question ... Will this also work for Unicode letters(Other languages) ? – Brad Dec 23 '10 at 21:40
  • @Brad: From the documentation for java.util.regex.Pattern it looks like `[a-zA-Z_0-9]` is used for "word" characters, so I assume that's also the definition they use for word boundaries. You could try using negative assertions instead of `\b` to look for certain Unicdoe character classes, but note that this will not work for Chinese or any other language that does not require spaces between words unless you first segment the input. – Laurence Gonsalves Dec 23 '10 at 22:14
0

You might want to look into atomic zero-width assertions -- patterns that match against positions inside a string (such as a word boundary), rather than text itself.

This question was previously asked; see this link for more info:

java String.replaceAll regex question

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886