11

Are there any (free) regular expression engines for Java, that can compile a regular expression to a DFA, and do group capturing while matching the DFA ?

I've found dk.brics.automaton and jrexx, which both compile to DFA, but neither seems to be able to do group capture. While the other engines I've found seem to compile to NFA.

Sami
  • 3,263
  • 3
  • 29
  • 37
  • 1
    For performance optimization. – Sami Dec 26 '09 at 18:39
  • 3
    I'm asking because usually those performances benefits arise from the inability of DFA engines to backtrack. If that's the case, perhaps you could achieve the same using atomic grouping/possessive quantifiers. Maybe you could post some examples of what you want to achieve? – Tim Pietzcker Dec 26 '09 at 18:43
  • 1
    How about some examples so we can see if your problem can be solved in a different way - since nobody seems to know a Java DFA regex engine with group capture... if it really *is* a problem and not premature optimization :) – Tim Pietzcker Dec 28 '09 at 18:34
  • Thank you for you interest, but there really isn't a problem that needs to be solved in a different way. I was just wanting to see, what performance effect a DFA based engine would have had for my application. – Sami Dec 28 '09 at 19:48
  • There shouldn't be any appreciable difference between a DFA and NFA engine if the NFA doesn't use backtracking. If you find a good linear space-time NFA engine you should use it. Even a naive implementation will still be constant-space/linear-time in the absence of groups. – Recurse Jun 12 '12 at 00:46

5 Answers5

3

try this one (probably not DFA but faster than java.util) http://jregex.sourceforge.net/gstarted-advanced.html#ngroups, or this one: http://userguide.icu-project.org

according to that test: http://tusker.org/regex/regex_benchmark.html, both are fast (we all know that the benchmarks only tests what the creator of the benchmark wanted to test).

When I needed really fast DFA regex I have spawned a process that used grep ;-) (For a 6GB log file it cut my times from 10minutes to a few seconds).

bartosz.r
  • 454
  • 4
  • 10
  • I doubt that it’s faster than java.util.regex. These small libraries come and go, java.util.regex gets optimized year after year. If you’re not using a better algorithm, java.util.regex will beat you eventually. See my answer for a regular expression engine that is quite fundamentally different from java.util.regex, DFA-based, and therefore faster. – nes1983 Dec 18 '13 at 20:28
1

I recently wrote one: tree-regex.

nes1983
  • 15,209
  • 4
  • 44
  • 64
0

For C there is TRE and Google's RE2 libraries. TRE uses DFA, RE2 uses NFA (as far as I understand), both could subgroup matching. But I didn't see such a library for Java.

Dmitry
  • 82
  • 3
  • 1
    RE2 is actually REALLY fast. It is worth pointing to it when people ask for regex and speed. – nes1983 Sep 27 '11 at 20:56
  • 2
    You've got it mixed up. TRE uses NFA, RE2 uses both NFAs and DFAs. Specifically, RE2 uses a DFA if there's at most one capture group, otherwise an NFA. – nes1983 Jul 15 '12 at 18:38
-2

you can try Pat regular expressions library @ http://www.javaregex.com/ .

Jeremy Gwa
  • 2,333
  • 7
  • 25
  • 31
  • From the website it's at least not obvious at all, that this engine would be DFA based nor that it would support group capture. If it is, and does, could you please confirm. – Sami Feb 07 '10 at 04:32
  • That lib (Stevesoft Pat) does support capturing groups, but it's definitely **not** DFA-based. – Alan Moore Feb 08 '10 at 07:24
-2

dk.brics.automaton is DFA does appear to do capturing groups. I expect that feature is new in the two years since this question. Check out class AutomatonMatcher.

See http://www.brics.dk/automaton/doc/dk/brics/automaton/AutomatonMatcher.html#group(int)

Darren Gilroy
  • 2,071
  • 11
  • 6