1

I have been trying to replace characters in string like below

data = data.replace(Regex("[a-z:]", RegexOption.IGNORE_CASE), "")
        .replace(Regex("/", RegexOption.IGNORE_CASE), ".")
        .replace(Regex(",", RegexOption.IGNORE_CASE), "")
        .replace(Regex("'", RegexOption.IGNORE_CASE), "")
        .replace(Regex("é",RegexOption.IGNORE_CASE),"")
        .replace(Regex("ê",RegexOption.IGNORE_CASE),"")
        .replace(Regex("ö",RegexOption.IGNORE_CASE),"")
        .replace(Regex("Ä",RegexOption.IGNORE_CASE),"")
        .replace(Regex("ä",RegexOption.IGNORE_CASE),"")
        .replace(Regex("ä |",RegexOption.IGNORE_CASE),"")

And

data = data.replace(Regex("[a-z:]", RegexOption.IGNORE_CASE), "")
        .replace("/", ".")
        .replace(",", "")
        .replace("'", "")
        .replace("é","")
        .replace("ê","")
        .replace("ö","")
        .replace("Ä","")
        .replace("ä","")

And I measured time required for both of this code and surprisingly code with regex turned out at least 20 times faster than normal replace.

As long as I have been reading about regex, they say regex is expensive operation, am I missing something?

Amit Bhandari
  • 3,014
  • 6
  • 15
  • 33
  • 2
    Is the space + `|` a typo? See `.replace(Regex("ä |",RegexOption.IGNORE_CASE),"")`. What is the outcome if you fix it (`"ä |"` => `"ä"`)? – Wiktor Stribiżew Jan 09 '19 at 07:54
  • 5
    *"And I measured time required for both..."* Did you do so [properly](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java)? – T.J. Crowder Jan 09 '19 at 07:56
  • 2
    Shall we question your measurement? – revo Jan 09 '19 at 07:57
  • 1
    Separately: Why the multiple calls to `replace`? The above requires just two, not nine. – T.J. Crowder Jan 09 '19 at 07:57
  • @T.J.Crowder that's not the point, I was going through existing codebase and found this code. Was trying to improve it when I came across this issue. – Amit Bhandari Jan 09 '19 at 07:59
  • @WiktorStribiżew Not a typo, checked without it as well. Result is same. – Amit Bhandari Jan 09 '19 at 07:59
  • @T.J.Crowder I did simple time comparison using currentTimeMillis. It came 3 ms on average for regex and >40ms on average for normal replace – Amit Bhandari Jan 09 '19 at 08:00
  • 3
    Ok, but note that `.replace(Regex("ä |",RegexOption.IGNORE_CASE),"")` and `.replace("ä","")` are not equal in what they do as the first one removes `ä` + space and the second only removes `ä`. – Wiktor Stribiżew Jan 09 '19 at 08:01
  • @AmitBhandari - Please read the answers to [the question I linked](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java). You can't do perf analysis on the JVM like that. – T.J. Crowder Jan 09 '19 at 08:03
  • @T.J.Crowder Okay I will check those answers out – Amit Bhandari Jan 09 '19 at 08:04
  • For sure `ä |` fails to match because there would be no `ä`s after the previous call to replace. – revo Jan 09 '19 at 08:10
  • Offtopic, but you can write one single regex for all the empty string replacements. This way you can eliminate all those redundant method chainings. – Adam Jan 09 '19 at 08:58

0 Answers0