1

I'm working on a script code generator. I created a template script file, which contains place holders for few parameters and placeholders, that should be replaced with real values during generation. Replacement is performed within the loop(s). Performance of the generator is kind of important (currently on Java 7). My dilemma as follows:

Do something like this:

private final String PHOLDER_SECT = "#__PHOLD_SECT__#";
private final String PARAM_PID    = "__param_pid";
private final String PARAM_NAME   = "__param_name";
private final String PARAM_DESC   = "__param_desc";
...
for (int i = 0; i < sectionCount; i++) {

    // do something here...
    masterTmpl[i] = masterTmpl[i].replace(PHOLDER_SECT, someSectionCode);
    // something else here...
    masterTmpl[i] = masterTmpl[i].replace(PARAM_DESC, desc)
                                 .replace(PARAM_NAME, name)
                                 .replace(PARAM_PID,  pid)
    ...
}

or something like this (the point being all placeholders are complied patterns):

private final Pattern regexSect = Pattern.compile("#__PHOLD_SECT__#", Pattern.LITERAL);
private final Pattern regexPid  = Pattern.compile("__param_pid",      Pattern.LITERAL);
private final Pattern regexName = Pattern.compile("__param_name",     Pattern.LITERAL);
private final Pattern regexDesc = Pattern.compile("__param_desc",     Pattern.LITERAL);
...
for (int i = 0; i < sectionCount; i++) {

    // do something here...
    masterTmpl[i] = this.regexSect.matcher(masterTmpl[i]).replaceAll(Matcher.quoteReplacement(someSectionCode));
    // something else here...
    masterTmpl[i] = this.regexDesc.matcher(masterTmpl[i]).replaceAll(Matcher.quoteReplacement(desc));
    masterTmpl[i] = this.regexName.matcher(masterTmpl[i]).replaceAll(Matcher.quoteReplacement(name));
    ...
}

I know that I can measure execution, and stuff, but I'm kind of hoping for an answer that explains the (un)importance of pattern compilation in this particular case...

Less
  • 3,047
  • 3
  • 35
  • 46
  • If you are replacing plenty of different things, I think neither option is particularly great from performance point of view. Either way you read the file or string multiple times, it could probably be done using some different tools (especially with most params starting with `__param_`). – Vlasec Jul 07 '15 at 14:18
  • @Vlasec what did you have in mind by "different tools"? how does placeholder naming affect this, not sure I follow...? – Less Jul 07 '15 at 14:21
  • I didn't encounter the same kind of problem myself, but basically, this is parsing. Regexp can be used, but if you read the file sequentially, looking for any `__param_...`, returning result using some outputstream, it could perform a lot better. – Vlasec Jul 07 '15 at 14:39
  • 2
    It is much faster to look for `"__param_pid|__param_name|__param_desc"`, replacing them each time they are found by the appropriate value, than to do multiple search-and-replace passes. Consistent naming allows you to search for `"__param([a-z]+)"` instead; within the `while (m.find())` loop you can apply the suitable replacements in either case. See my answer below. – tucuxi Jul 07 '15 at 14:40

1 Answers1

2

This code is probably much faster, as it finds occurrences of patterns in a single search (instead of one per pattern); and most importantly, does all replacements in a single pass, instead of requiring one pass per pattern. Building many strings is somewhat expensive, because of copying and memory overhead - this builds only one fully-replaced string, in the last line.

    public static String replaceMany(String input, 
            Map<String, String> replacements) {
        // build a composite pattern for all replacement keys
        StringBuilder sb = new StringBuilder();
        String prefix = "";
        for (String k : replacements.keySet()) {
            sb.append(prefix).append(Pattern.quote(k));
            prefix = "|";
        }
        Pattern p = Pattern.compile(sb.toString());
        // replace in single loop
        Matcher m = p.matcher(input);
        StringBuffer output = new StringBuffer();
        while (m.find()) {
            // inspired by http://stackoverflow.com/a/948381/15472
            m.appendReplacement(output, "");
            output.append(replacements.get(m.group(0)));
        }
        m.appendTail(output);
        return output.toString();
    }
tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • The given code, even though based on very reasonable assumptions, seems pretty tricky to make it work when actual replacement values contain special characters (e.g. $ and \), which is exactly my case. I either keep getting "Illegal group reference" or "named capturing group is missing trailing '}' ", or similar, and can't seem to get around these yet... – Less Jul 08 '15 at 06:41
  • 1
    @Less It may get complicated, but normally, all you need is `sb.append(prefix).append("(").append(k).append(")")`. With `$`, there should be no problem. With capturing groups, they get renumbered and there's a method for it. With `\Q` without corresponding `\E`, it'll break. There may be more problems, but they're all solvable. +++ This may be worth another question like "How to combine regexes containing ....". – maaartinus Jul 08 '15 at 09:50
  • @Less fixed - I forgot to quote the replacement. It was not that complicated, as people had already found workarounds in http://stackoverflow.com/questions/947116/matcher-appendreplacement-with-literal-text – tucuxi Jul 08 '15 at 10:31