42

I am trying to write a method that will accept a String, inspect it for instances of certain tokens (e.g. ${fizz}, ${buzz}, ${foo}, etc.) and replace each token with a new string that is fetched from a Map<String,String>.

For example, if I pass this method the following string:

"How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}."

And if the method consulted the following Map<String,String>:

Key             Value
==========================
"fizz"          "brown"
"buzz"          "arsonist"
"foo"           "feet"

Then the resultant string would be:

"How now brown cow. The arsonist had oddly-shaped feet."

Here is my method:

String substituteAllTokens(Map<String,String> tokensMap, String toInspect) {
    String regex = "\\$\\{([^}]*)\\}";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(toInspect);
    while(matcher.find()) {
        String token = matcher.group();     // Ex: ${fizz}
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacementValue = null;

        if(tokensMap.containsKey(tokenKey))
            replacementValue = tokensMap.get(tokenKey);
        else
            throw new RuntimeException("String contained an unsupported token.");

        toInspect = toInspect.replaceFirst(token, replacementValue);
    }

    return toInspect;
}

When I run this, I get the following exception:

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition near index 0
${fizz}
^
    at java.util.regex.Pattern.error(Pattern.java:1730)
    at java.util.regex.Pattern.closure(Pattern.java:2792)
    at java.util.regex.Pattern.sequence(Pattern.java:1906)
    at java.util.regex.Pattern.expr(Pattern.java:1769)
    at java.util.regex.Pattern.compile(Pattern.java:1477)
    at java.util.regex.Pattern.<init>(Pattern.java:1150)
    at java.util.regex.Pattern.compile(Pattern.java:840)
    at java.lang.String.replaceFirst(String.java:2158)
    ...rest of stack trace omitted for brevity (but available upon request!)

Why am I getting this? And what is the correct fix? Thanks in advance!

5 Answers5

61

In ${fizz}

{ is an indicator to the regex engine that you are about to start a repetition indicator, like {2,4} which means '2 to 4 times of the previous token'. But {f is illegal, because it has to be followed by a number, so it throws an exception.

You need to escape all regex metacharacters (in this case $, { and }) (try using http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String) ) or use a different method that substitutes a string for a string, not a regex for a string.

Patashu
  • 21,443
  • 3
  • 45
  • 53
  • 3
    The `{` is escaped though: `"\\$\\{([^}]*)\\}"` – Brian Jul 04 '13 at 05:02
  • 5
    @Brian No it's not. Stack trace: `at java.lang.String.replaceFirst(String.java:2158)` refers to this line: `toInspect = toInspect.replaceFirst(token, replacementValue);` And `token`'s value is `${fizz}`, with no escaping. – Patashu Jul 04 '13 at 05:03
  • Ah, yes! Of course. Read too fast, I guess. +1 Maybe you could elaborate more on what exactly is being passed in and why it's failing. – Brian Jul 04 '13 at 05:05
  • Exactly - I'm still struggling to put your answer into motion, but thanks @Patashu and +1! –  Jul 04 '13 at 05:06
  • 3
    @TicketMonster `replaceFirst` expects a regular expression in the first argument, just like the `Pattern.compile` method. However, you're passing it `${fizz}` without any escaping. Use [`Pattern.quote(token)`](http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)) before passing `token` in. – Brian Jul 04 '13 at 05:08
  • that still doesn't answer the question..after replacing string matcher would point to invalid position! – Anirudha Jul 04 '13 at 05:09
  • @TicketMonster your code would not replace properly even after escaping it..matcher would point to an invalid position..for example try replacing this `${fizz}${buzz}` where fizz is mapped to an **empty string** – Anirudha Jul 04 '13 at 05:15
  • @Anirudh Again, strings are immutable. Changing the `toInspect` *reference* does nothing to the string that the `Matcher` is using (the original `toInspect` reference). – Brian Jul 04 '13 at 05:15
  • ahh..ok..in this case it would work since op is performing linear search for `${}` but still..changing string to which matcher refers to is dangerous – Anirudha Jul 04 '13 at 05:23
  • @Anirudh Looks like this solution works to me: http://ideone.com/AocIh6 Your solution enters an infinite loop by the way. – Brian Jul 04 '13 at 05:23
  • @Anirudh You can't change the string to which matcher refers to because strings are immutable. – Patashu Jul 04 '13 at 05:24
  • @Anirudh [Read this please](http://stackoverflow.com/questions/1552301/immutability-of-strings-in-java). – Brian Jul 04 '13 at 05:24
  • @Brian it should be `matcher.reset(toInspect);` and i **never** said its `mutable` i said matcher would point to invalid position.op's code `works` becuase he's replacing through string **not** matcher,..(missed it).see [this](http://ideone.com/nnrdnz) and remove reset line.it won't run – Anirudha Jul 04 '13 at 05:45
  • @Patashu replace it through `matcher` not `toInspect` i.e `matcher.replaceFirst` – Anirudha Jul 04 '13 at 05:53
  • @Anirudh "Invoking this method changes this matcher's state. If the matcher is to be used in further matching operations then it should first be reset." This rule only applies if you use replaceFirst on the matcher itself. The OP did NOT, so why even bring it up? – Patashu Jul 04 '13 at 05:54
  • @Patashu i had misinterpreted the question to which i came know minutes ago..:P – Anirudha Jul 04 '13 at 05:57
  • Strings are mutable if you allow reflection. http://algs4.cs.princeton.edu/12oop/MutableString.java.html But this has nothing to do with the original question anyway. – Miguel Jul 04 '13 at 08:28
  • i know this is very old thread, but now I am facing same issue, can some one please advise me on concreate solution for this, Thank you! – Vasu May 04 '21 at 08:01
7

As pointed out by Patashu, the problem is in replaceFirst(token, replacementValue), that expects a regex in the first argument, not a literal. Change it to replaceFirst(Pattern.quote(token), replacementValue) and you will do alright.

I also changed a bit the first regex, as it goes faster with + instead of * but that's not necessary.

static String substituteAllTokens(Map<String,String> tokensMap, String toInspect) {
    String regex = "\\$\\{([^}]+)\\}";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(toInspect);
    String result = toInspect;
    while(matcher.find()) {
        String token = matcher.group();     // Ex: ${fizz}
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacementValue = null;

        if(tokensMap.containsKey(tokenKey))
            replacementValue = tokensMap.get(tokenKey);
        else
            throw new RuntimeException("String contained an unsupported token.");

        result = result.replaceFirst(Pattern.quote(token), replacementValue);
    }

    return result;
}
Miguel
  • 566
  • 1
  • 5
  • 16
  • 1
    If you want to get *really* picky about speed, use `[^}]++` to make it possessive instead of just greedy so it will never backtrack. – Brian Jul 04 '13 at 05:49
  • 1
    The speed results are surprising: using my method, the original regex (`[^}]*`) took 37.04 ms to run 10000 times, my regex (`[^}]+`) took 37.35 ms and suggested regex (`[^}]++`) 36.98 ms. Almost no difference. – Miguel Jul 04 '13 at 07:12
  • For a longer input `"How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}"` the results were different for regex. The original regex (`[^}]*`) took 515.47 ms to run 10000 times, my regex (`[^}]+`) took 514.41 ms and regex suggested by @Brian (`[^}]++`) 507.41 ms. Still close, but Brian's did better. – Miguel Jul 04 '13 at 07:58
  • Did you use an actual benchmarking tool to check like [Caliper](https://code.google.com/p/caliper/)? Also, you should make sure that you test with partially matched strings as well, such as `The ${fizz cow` with no terminating `}`. Also, to be fair, I said *really* picky :3 – Brian Jul 05 '13 at 04:14
  • No, I wrote the benchmarking myself. Did a warm-up loop in the beginning to activate hotspot and called `System.gc()` several times outside measurement. But, yes, I know it is not bulletproof. I didn't know about Caliper, though. Good to know it! – Miguel Jul 05 '13 at 06:21
2

Adapted from Matcher.replaceAll

boolean result = matcher.find();
if (result) {
    StringBuffer sb = new StringBuffer();
    do {
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacement = Matcher.quoteReplacement(tokensMap.get(tokenKey));
        matcher.appendReplacement(sb, replacement);
        result = matcher.find();
    } while (result);
    matcher.appendTail(sb);
    return sb.toString();
}
johnchen902
  • 9,531
  • 1
  • 27
  • 69
  • Adding here my test results using this method: the original regex (`[^}]*`) took 15.14 ms to run 10000 times, my regex (`[^}]+`) took 14.88 ms and regex suggested by @Brian (`[^}]++`) 15.89 ms. As I said before this method is a absolute winner in terms of speed and here my regex fits better :) – Miguel Jul 04 '13 at 07:16
  • @Miguel Try to use a very long `toInspect` as input, such as `"${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}...${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}"` (I bet my `StringBuffer` will be faster than your `String`, when the length grow, it won't be only 3x faster.) – johnchen902 Jul 04 '13 at 07:21
  • Using `"How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}"` as the input this method performed even better then mine, it was 5x faster. The results were different for regex. The original regex (`[^}]*`) took 92.94 ms to run 10000 times, my regex (`[^}]+`) took 93.6 ms and regex suggested by @Brian (`[^}]++`) 91.83 ms. Brian's did better. Why is that so? – Miguel Jul 04 '13 at 07:51
  • 1
    @Miguel I only know about 3x->5x: 1. `replaceFirst` use a regex again, which is unnecessary. 2. In a loop, `StringBuffer` concats `String` faster than directly in `String`, since `String` generate a new `String` everytime. Which regex to use is not important. – johnchen902 Jul 04 '13 at 08:06
2

You can make your RegEx a bit ugly, but this will work

String regex = "\\$[\\{]([^}]*)[\\}]";
Vladyslav Nikolaiev
  • 1,819
  • 3
  • 20
  • 39
0

Use String-replaceAll. Sample input String for testing "SESSIONKEY1":

"${SOMESTRING.properties.SESSIONKEY1}"

,

    String pattern = "\\\"\\$\\{SOMESTRING\\.[^\\}]+\\}\\\""; 
    System.out.println(pattern);
    String result = inputString.replaceAll(pattern, "null");
    return result.toString();
Milan Das
  • 21
  • 5