33

I just found a bug in some code I didn't write and I'm a bit surprised:

Pattern pattern = Pattern.compile("\\d{1,2}.\\d{1,2}.\\d{4}");
Matcher matcher = pattern.matcher(s);

Despite the fact that this code fails badly on input data we get (because it tries to find dates in the 17.01.2011 format and gets back things like 10396/2011 and then crashed because it can't parse the date but that really ain't the point of this question ; ) I wonder:

  • isn't one of the point of Pattern.compile to be a speed optimization (by pre-compiling regexps)?

  • shouldn't all "static" pattern be always compiled into static pattern?

There are so many examples, all around the web, where the same pattern is always recompiled using Pattern.compile that I begin to wonder if I'm seeing things or not.

Isn't (assuming that the string is static and hence not dynamically constructed):

static Pattern pattern = Pattern.compile("\\d{1,2}.\\d{1,2}.\\d{4}");

always preferrable over a non-static pattern reference?

Gugussee
  • 1,673
  • 3
  • 16
  • 26
  • 4
    The bug in the pattern is that `.` matches anything. Use `\.` (or rather `\\.`; the first backslash is for Java) to fix that. – Donal Fellows Feb 08 '11 at 15:56
  • @Donal Fellows: thanks a lot, I know I know, I just wanted to paste the broken code as I read it. To me there are **two** WTFs in this code: first that the pattern compilation ain't static and then secondly that it's kind of a gross *regexp-now-you-have-two-problems* issue :) – Gugussee Feb 08 '11 at 16:10
  • 1
    All the answers that say that compiling statically is better are correct. But there is a little bit of premature optimization here. If you see many examples around the web using Pattern.compile non-statically, it is probably because it simply isn't a bottleneck very often, and may be just a tiny bit easier to read or maintain that way. Always measure before optimizing, otherwise you may find that the time spent merely exploring the issue was greater than all the CPU time your program will ever spend in Pattern.compile put together :-). – Avi Feb 08 '11 at 16:51
  • 1
    @Avi: it isn't easier to read. When it is static you can give a meaningful name to your pattern. Also, speaking of *"premature optimization"* for what is just good Java practice is just weird. The time won by all the programs written by people who will read this thread makes the question and the correct answers worthy. – SyntaxT3rr0r Feb 09 '11 at 00:24
  • That's why I offered it as a comment, not an answer. :-) – Donal Fellows Feb 09 '11 at 01:09
  • http://stackoverflow.com/questions/1720191/java-util-regex-importance-of-pattern-compile – Christophe Roussy Apr 19 '16 at 10:25

5 Answers5

35
  1. Yes, the whole point of pre-compiling a Pattern is to only do it once.
  2. It really depends on how you're going to use it, but in general, pre-compiled patterns stored in static fields should be fine. (Unlike Matchers, which aren't threadsafe and therefore shouldn't really be stored in fields at all, static or not.)

The only caveat with compiling patterns in static initializers is that if the pattern doesn't compile and the static initializer throws an exception, the source of the error can be quite annoying to track down. It's a minor maintainability problem but it might be worth mentioning.

biziclop
  • 48,926
  • 12
  • 77
  • 104
  • using a good IDE surely helps here... IntelliJ IDEA will clearly point out errors in pattern that wouldn't compile (even on incomplete source code). – SyntaxT3rr0r Feb 09 '11 at 00:28
  • @SyntaxT3rr0r That's quite a cool feature. (By the way, I haven't forgotten about your agent-GC question, I've just realised I forgot how to code in C, so it takes a bit longer to come up with a working solution.) – biziclop Feb 09 '11 at 00:43
11

first, the bug in pattern is because dot (.) matches everything. If you want to match dot (.) you have to escape it in regex:

Pattern pattern = Pattern.compile("\\d{1,2}\\.\\d{1,2}\\.\\d{4}");

Second, Pattern.compile() is a heavy method. It is always recommended to initialize static pattern (I mean patterns that are not being changed or not generated on the fly) only once. One of the popular ways to achieve this is to put the Pattern.compile() into static initializer.

You can use other approach. For example using singleton pattern or using framework that creates singleton objects (like Spring).

AlexR
  • 114,158
  • 16
  • 130
  • 208
  • I know that it's because the dot matches everything ;) I think I'll go with the static initializer in this case: using the singleton pattern of Spring to create one instance of a Pattern seems a bit extreme :) – Gugussee Feb 08 '11 at 16:02
  • Sure, I do not suggest you to use Spring only to create the instance of pattern. I just said that there are solutions other than static initialization. I mean that if you are already using spring in your project you can put all patterns to one singleton bean and retrieve them when you need. – AlexR Feb 08 '11 at 16:08
  • 3
    @AlexR How would instantiating the `Pattern` in a static initializer such as `static{}` differ from declaring the `Pattern` as a static field such as `private static final Pattern pattern = Pattern.compile()`? – Kevin Bowersox Feb 11 '14 at 15:26
  • @KevinBowersox, I think they are identical if you use a litteral string pattern definition + static + final, except for the order in which they are evaluated when the class is loaded. – Christophe Roussy Apr 19 '16 at 10:21
  • @ChristopheRoussy, non-static fields are initialized on creating instance of class. So, if member of type `Pattern` is not static its `compile()` method is called for each object. If however the member is static this method is called only once when class is used first time. – AlexR Apr 19 '16 at 17:12
  • @AlexR I never mentioned non-static fields, those are part of the class instance (so yes there is one for each instance), the discussion was about a static variable declaration vs a static block – Christophe Roussy Apr 20 '16 at 08:05
5

Yes, compiling the Pattern on each use is wasteful, and defining it statically would result in better performance. See this SO thread for a similar discussion.

Community
  • 1
  • 1
Kaleb Brasee
  • 51,193
  • 8
  • 108
  • 113
0

Static Patterns would remain in memory as long as the class is loaded.

If you are worried about memory and want a throw-away Pattern that you use once in a while and that can get garbage collected when you are finished with it, then you can use a non-static Pattern.

dogbane
  • 266,786
  • 75
  • 396
  • 414
0

It is a classical time vs. memory trade-off. If you are compiling a Pattern only once, don't stick it in a static field. If you measured that compiling Patterns is slow, pre-compile it and put it in a static field.

Jochen Bedersdorfer
  • 4,093
  • 24
  • 26