6

I need to be able to replace all occurrences of the word "and" ONLY when it occurs between single quotes. For example replacing "and" with "XXX" in the string:

This and that 'with you and me and others' and not 'her and him'

Results in:

This and that 'with you XXX me XXX others' and not 'her XXX him'

I have been able to come up with regular expressions which nearly gets every case, but I'm failing with the "and" between the two sets of quoted text.

My code:

String str = "This and that 'with you and me and others' and not 'her and him'";

String patternStr = ".*?\\'.*?(?i:and).*?\\'.*";
Pattern pattern= Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.matches());
while(matcher.matches()) {
    System.out.println("in matcher");
    str = str.replaceAll("(?:\\')(.*?)(?i:and)(.*?)(?:\\')", "'$1XXX$2'");
    matcher = pattern.matcher(str);
}

System.out.println(str);
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
BlueVoid
  • 929
  • 1
  • 9
  • 26
  • 2
    You'd need to write a proper parser, imo. Regular expressions aren't really suited to this type of pattern matching. – David Fells May 04 '11 at 18:28
  • 1
    @DF I don't believe that. Regular expressions can do everything. Somebody will get you an answer. – Reverend Gonzo May 04 '11 at 18:32
  • @RG, regular expressions can only deal with regular languages, hence the name. – Carl Norum May 04 '11 at 18:34
  • I did make some progress by using a regular expression to match each quoted segment, then I could just do a replaceAll on each segment. I could then get the non-matched portion using split, but then I had chunks of the full string and couldn't think of how to put it back together properly. – BlueVoid May 04 '11 at 18:35
  • This is close but I won't post it as an answer since it's not 100%: `'(?:([^']*?)and([^']*?))*'` and the replacement: `'$1XXX$2'` – Josh M. May 04 '11 at 18:46

2 Answers2

6

Try this code:

str = "This and that 'with you and me and others' and not 'her and him'";
Matcher matcher = Pattern.compile("('[^']*?')").matcher(str);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
   matcher.appendReplacement(sb, matcher.group(1).replaceAll("and", "XXX"));
}
matcher.appendTail(sb);
System.out.println("Output: " + sb);

OUTPUT

Output: This and that 'with you XXX me XXX others' and not 'her XXX him'
anubhava
  • 761,203
  • 64
  • 569
  • 643
2
String str = "This and that 'with you and me and others' and not 'her and him'";

Pattern p = Pattern.compile("(\\s+)and(\\s+)(?=[^']*'(?:[^']*+'[^']*+')*+[^']*+$)");
System.out.println(p.matcher(str).replaceAll("$1XXX$2"));

The idea is, each time you find the complete word and, you you scan from the current match position to the end of the string, looking for an odd number of single-quotes. If the lookahead succeeds, the matched word must be between a pair of quotes.

Of course, this assumes quotes always come in matched pairs, and that quotes can't be escaped. Quotes escaped with backslashes can be dealt with, but it makes the regex much longer.

I'm also assuming the target word never appears at the beginning or end of a quoted sequence, which seems reasonable for the word and. If you want to allow for target words that are not surrounded by whitespace, you could use something like "\\band\\b" instead, but be aware of Java's problems in the area of word characters vs word boundaries.

Community
  • 1
  • 1
Alan Moore
  • 73,866
  • 12
  • 100
  • 156