2

In Java, are there any performance gains with using string functions like endsWith, startsWith, contains etc., over using RegEx to accomplish same task?

Anthony J.
  • 375
  • 1
  • 5
  • 14
  • 3
    define "performance gain". If by that you mean "difference when processing 10 strings per second", then **no, unless your strings are extremely long, in order of >MB**. If it means "difference when processing millions of strings per second", then **maybe/usually, depending on the exact case**. –  Nov 16 '16 at 10:35
  • 1
    @MarkoTopolnik IFTFY *chuckles* –  Nov 16 '16 at 10:36
  • 2
    also, related: http://stackoverflow.com/questions/2023792/regex-vs-contains-best-performance?rq=1 –  Nov 16 '16 at 10:41
  • let's say I have a file with 2-4 millions lines, where each line 500 characters long and I have to check each line if it contains, starts with or ends with some other string value and of course I have to do as fast as possible. – Anthony J. Nov 16 '16 at 10:46
  • I would guess that the regex will be slower because this will be interpreted. But there is only one way to be sure, do a Benchmark on 1% of your file. – AxelH Nov 16 '16 at 10:49
  • 1
    @AnthonyJ. - Regex will be slower because it involves creation of instances of `Pattern` and `Matcher` classes (or atleast `Matcher`). there is some amount of synchronization going on under the hood. So, if you have a lot of Strings, using `startsWith` , `endsWith` would be better – TheLostMind Nov 16 '16 at 10:58
  • @TheLostMind Synchronization has practically zero cost on uncontended monitors. The only perf difference can come from Java's suboptimal regex implementation, but how much is anybody's guess. – Marko Topolnik Nov 16 '16 at 12:56
  • So, OP, you said you have to check each line for all three of starts with, contains, ends with. A single call to `indexOf(String)` is your best option in my opinion, unless the same string can occur at more than one place in the same line. – Marko Topolnik Nov 16 '16 at 12:58
  • @MarkoTopolnik sorry I think I didn't explain it correctly. I just need to use one of those function depending on some condition, so I just need to know which one is faster: string function or regex, as I always read that regex is considered expensive operation. @ TheLostMind answer was good for me – Anthony J. Nov 16 '16 at 13:07
  • 1
    One thing is for sure: regex is not faster than a dedicated method. However there is absolutely no evidence that the dedicated method call is faster than regex. It is very likely that they perform equally well, especially given the triviality of the regex you'd use. – Marko Topolnik Nov 16 '16 at 13:08
  • @MarkoTopolnik - But assuming that mem barrier instruction(s) will be generated by the JVM, wouldn't *synchronization* matter? . Yes, JIT might do *lock elision* , but its not guaranteed to happen *always* right?. Yes, for uncontended monitors like `synchronized (new someObject()) {` this might have no effect, but in case of `Matcher` it can right? – TheLostMind Nov 16 '16 at 13:14
  • 2
    @TheLostMind The only synchronization here (if any) will be on the Pattern object while it creates a new Matcher. The matcher itself is definitely a single-threaded. But if only one thread ever acquires the monitor, the thin lock will be used, which has next to zero overhead because it works like a lock that the thread acquires forever, until this is revoked by a very expensive lock inflation operation. Inflation happens at a safepoint, which is how they eliminate all cost for the happy case. – Marko Topolnik Nov 16 '16 at 13:18
  • @TheLostMind I recommend Cliff Click! http://www.azulsystems.com/blog/cliff/2010-01-09-biased-locking – Marko Topolnik Nov 16 '16 at 13:22

0 Answers0