Java RegExp: Capture part after a character but don't replace the character

Question

I am using Java to parse through a JavaScript file. Because the scope is different than expected in the environment in which I am using it, I am trying to replace every instance of i.e.

test = value

with

window.test = value

Previously, I had just been using

writer.append(js.getSource().replaceAll("test", "window.test"));

which obviously isn't generalizable, but for a fixed dataset it was working fine.

However, in the new files I'm supposed to work with, an updated version of the old ones, I now have to deal with

window['test'] = value

and

([[test]])

I don't want to match test in either of those cases, and it seems like those are the only two cases where there's a new format. So my plan was to now do a regex to match anything except ' and [ as the first character. That would be ([^'\[])test; however, I don't actually want to replace the first character - just make sure it's not one of the two I don't want to match.

This was a new situation for me because I haven't worked with replacement with RegExps that much, just pattern matching. So I looked around and found what I thought was the solution, something called "non-capturing groups". The explanation on the Oracle page sounded like what I was looking for, but when I re-wrote my Regular Expression to be (?:[^'\\[])test, it just behaved exactly the same as if I hadn't changed anything - replacing the character preceding test. I looked around StackOverflow, but what I discovered just made me more confident that what I was doing should work.

What am I doing wrong that it's not working as expected? Am I misusing the pattern?

http://www.regexplanet.com/advanced/java/index.html, along with the examples of Regexes, expressions to match, and the results, are an example. — Andrew Latham, Dec 10 '12 at 19:58
You can refer to this question http://stackoverflow.com/questions/632204/java-string-replace-using-regular-expressions — Smit, Dec 10 '12 at 19:59
That was exactly the example I was looking at; however, when I used it in `replaceAll`, it did not behave as I wanted. For example, if I wanted to replace "http://stackoverflow.com" with "http://google.com" and also wanted to catch ftp, I would use str.replaceAll("(?:http|ftp)://...", "google.com") but the result would just be "google.com" — Andrew Latham, Dec 10 '12 at 20:04
Non-capturing groups (`(?:...)`) only affect the **groups**, not the match itself. See an excellent example at http://stackoverflow.com/questions/3512471/non-capturing-group, in which a non-matching-group `http` is still part of the match, but not in a group. — apsillers, Dec 10 '12 at 20:04
That you apsillers, that explanation makes sense to me. I guess my mistake was expecting only the capturing group to be replaced. — Andrew Latham, Dec 10 '12 at 20:09

score 3 · Accepted Answer · answered Dec 10 '12 at 20:03

If you include an expression for the character in your regex, it will be part of what is matched.

The trick is to use what you match in the replacement String, so you replace that bit by itself.

try :

replaceAll("([^'\[])test", "$1window.test"));

the $1 in the replacement String is a back reference to what capturing group 1 matched. In this case that is the character preceding test

score 0 · Answer 2 · answered Dec 10 '12 at 20:12

0

Why not simply test on "(test)(\s*)=(\s*)([\w\d]+)" ? That way you only match "test", then whitespace, followed by an '=' sign followed by a value (in this case consisting of digits and alphabetical letters and the underscore character). You can then use the groups (between parentheses) to copy the value -and even the whitespace if required - to your new text.

answered Dec 10 '12 at 20:12

Maarten Bodewes

90,524
13
150
263

The example I gave isn't really comprehensive - there are also places with i.e. test.n = 5 or x = test.a.b.c.d.substring(4, 2);. In that instance I would want it to become window.test.a.b.c.d... – Andrew Latham Dec 10 '12 at 20:18
OK, that fine then, I just wanted to mention that it is sometimes easier to match what you are actually wanting to be matched - the opposite regex more or less. – Maarten Bodewes Dec 10 '12 at 20:30

Java RegExp: Capture part after a character but don't replace the character

2 Answers2