1

There are many websites which take a string as user input and allow you to create a regular expression (regex) from pieces of the string.

But I could not find any java library which does the same. Is there any Java library available which generates a regular expression that exactly matches a string?

String inputString = "ABC345";
String regularExpression = Something.generateRegEx(inputString);

or something like that.

Note: I have a condition wherein I want to take some string from user, generate regular expression and then match for that pattern on some data-sets to extract similar patterns. I have created a small utility, but it is not that reliable yet. Moreover, I am looking for some well-tested library.

EDIT :

Please visit txt2re.com. I want a java library which performs the same function.

LightCC
  • 9,804
  • 5
  • 52
  • 92
Saurabh
  • 1,405
  • 1
  • 21
  • 42

4 Answers4

3

Pattern.quote(String) returns a (string) regex that matches the specified string exactly.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
2

I think, the txt2re.com has a database from known regular expressions, since the tool extends its answers with semantics like "date" or "email" for date and email formats. Otherwise, it gives an expression, which validates only a string but not a "regular language". Regular languages are expressed by regular expressions and they can be calculated by finite-state machines, but they are sets of limited words (all finite languages are regular). For example a simple language like:

L = { (a^n)(b^n) | n >= 0 } is not regular. (proof with pumping lemma)

L = {ab, aabb, aaabbb,...} (not- regular) 

if you consider, that the input is a set of infinite words (inclusive natural languages), however, the regular expressions can not describe all of them. In order to generate regular expressions for a language, you had to first describe it with a (TYPE-3) grammar.

if your language has only a word like this:

L = { your.name@example.com }

then you can write a basic compiler iterating over the chars while checking their types, pseudo:

s = size(input) 
result = ""
for (i = 0; i < s; i++) {
   if input[i] is numeric
      result += "d"
   else if input[i] is word
      result += "w" 
   ...
}
return result
Erhan Bagdemir
  • 5,231
  • 6
  • 34
  • 40
  • Thanks for your detailed answer. So, there's no such library already available. And to make such library, one need to have database of known regular expressions included in that library. rite? Thanks for your pseudo code, in fact my current running code (work-around) uses same logic for generating regular expression. – Saurabh Jul 31 '12 at 04:42
0

A genetic algorithm based java library like regex++ url: https://github.com/MaLeLabTs/RegexGenerator can be used for the same purpose.

Aayush
  • 220
  • 1
  • 11
-1

If what you want is to find a regex matching a given String, this does not make sense because there exists an infinite number of it.

On a contrary if you want to build a Pattern object from a regex that is input from the user, use the standard java API (java.util.regex.*) this way :

Pattern p = Pattern.compile(inputString);
kgautron
  • 7,915
  • 9
  • 39
  • 60