39

Did anyone come across Java version of Google's regular expression library RE2 or a java library with similar capabilities and good performance? The performance requirement is linear time with regard to the length of regular expression and the input text length.

Clarification

Most regular expression implementation use a backtracking algorithm to match the input text and hence are exponential on some simple regular expressions like (.*).(.*).(.*).(.*). RE2 is a library from google that solves this problem by using an algorithm that varies linearly with input size using the concepts of Automata theory. The questioner wants to know whether there exists libraries for Java that are based on this algorithm.

Caffeinated
  • 11,982
  • 40
  • 122
  • 216
depthofreality
  • 579
  • 6
  • 14
  • 20
    That is of course a real question. It's neither vague, nor incomplete, nor overly broad. – nes1983 Sep 30 '11 at 15:22
  • @nes1983, I don't get it either. – ergosys Dec 30 '11 at 04:47
  • 1
    Here is information about linear-time regular expression matching: http://swtch.com/~rsc/regexp/regexp3.html – Miles Dec 31 '11 at 05:42
  • 3
    Did you hear about [FIRE/J](http://bkarak.wizhut.com/www/programs/fire/fire/fire.html) ? It's great dissertation work, read the [article](http://www.dmst.aueb.gr/dds/pubs/jrnl/2007-SPANDE-FIRE/html/KS07.html). My benchmarks shows that it at least **10x** faster than current JDK implementation. – Nikita Koksharov May 13 '12 at 09:43
  • 6
    This question really shouldn't have been closed... – Matt Wonlaw Sep 08 '12 at 15:39
  • 2
    Can't add as answer since question got closed, but I've found this: https://github.com/logentries/re2-java - seems to use JNI to call the C++ version of RE2, though not sure if it's completed/useable yet. – Peter Boughton Jan 28 '13 at 00:00
  • 2
    "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." I forgot about this when I clicked reopen, but it's definitely off topic. – Brigand Sep 19 '14 at 23:50
  • [Google has an internal port of re2 to Java](http://mail-archives.apache.org/mod_mbox/commons-dev/201502.mbox/%3CCA+3tv23GcrDe7773umKnVTVFOHyfYBqcT5+0+Q1Ra9gE_1=9_g@mail.gmail.com%3E) that they may open source in the future. – Miles Feb 12 '15 at 20:02
  • As noted in an answer below, Google have released a Java port of RE2. You should consider accepting that answer. – Gonen I Jun 30 '15 at 11:10

3 Answers3

15

Google today released a pure-Java port of Go's RE2 implementation. You can find it here:

https://github.com/google/re2j

Alan Donovan
  • 504
  • 4
  • 6
5

There is a finite-state automata package for Java here: www.brics.dk/automaton; also see this article. Here is a simple example:

RegExp r = new RegExp("ab(c|d)*");
Automaton a = r.toAutomaton();
String s = "abcccdc";
System.out.println("Match: " + a.run(s)); // prints: true
Divyesh Kanzariya
  • 3,629
  • 3
  • 43
  • 44
denim2x
  • 119
  • 1
  • 3
3

Google search yielded this.

https://github.com/logentries/re2-java

it says it only supports linux 64 bit.

Edit: I believe a better answer is now available, as answered by Alan Donovan, since Google themselves have released a port of RE2 https://github.com/google/re2j

Gonen I
  • 5,576
  • 1
  • 29
  • 60