2

I am trying in Java to surround a word in HTML with some markup. This code throws a ArrayIndexOutOfBoundsException when the replaceAll is called.

Pattern pattern = Pattern.compile(wordToHighlight + "\\w{0,5}");
String replacement = "<span class='highlight'>$1</span>";
Matcher matcher = pattern.matcher(html);

if (matcher != null)
    if (matcher.find())
        retVal = matcher.replaceAll(replacement);
Ian Vink
  • 66,960
  • 104
  • 341
  • 555
  • 1
    What does `html` contain at that point? – Jim Garrison Jul 29 '10 at 16:05
  • Who cares? This is a simplistic attempt to highlight search text in an arbitrary HTML string. There are countless ways this can go wrong, but it will work in most cases. Users can have fun with interesting effects if they include angle brackets in their search expressions. – Carl Smotricz Jul 29 '10 at 16:10
  • FYI, neither `if (matcher != null)` nor `if (matcher.find())` in your code is doing anything useful. If the code reaches that point `matcher` *can't* be `null` (it would have thrown an exception), and the first thing `replaceAll()` does is call `find()`. If that returns `false` it just returns the original string. – Alan Moore Jul 29 '10 at 19:44

3 Answers3

5

I'm not familiar with Regex in Java so I'll just go ahead and make a guess, excuse me if I'm way off base. In PCRE (PHP) $1 would refer to the first capture group, since you don't have a capture group that could throw an error. Try using $0.

Peter O'Callaghan
  • 6,181
  • 3
  • 26
  • 27
2

You should try putting a capturing group on your search expression. i.e. wrap your string in parentheses.

i.e.

"(" + wordToHighlight + "\\w{0,5})"
Carl Smotricz
  • 66,391
  • 18
  • 125
  • 167
0

Try four backslashes:

Pattern.compile(wordToHighlight + "\\w{0,5}");

Somehow the escaping takes place twice. That means:

1.) \\ turns into \

2.) then \ turns into \