2

I need to generate, at run time, a regular expression that will match a range of numeric values.

For example: At run time I may discover that I need a regular expression matching all the files in the "range" a-261-b.something to a-543-b.something.

I need to generate a regular expression that will match all of this files. Any ideas?

I need it in Java, so if anyone know any Java-specific way to so this, it's also acceptable.

aioobe
  • 413,195
  • 112
  • 811
  • 826
AAaa
  • 3,659
  • 7
  • 35
  • 40

3 Answers3

14

Whether or not regular expressions are well suited for this task is debatable. Most people would probably argue that it's not.

As I understand it however, you have no choice as the API you're using takes a regular expression as argument, so here goes...

Code

public class NumericRangeRegexGenerator {

    private static String baseRange(String num, boolean up, boolean leading1) {

        char c = num.charAt(0);
        char low  = up ? c : leading1 ? '1' : '0';
        char high = up ? '9' : c;

        if (num.length() == 1)
            return charClass(low, high);

        String re = c + "(" + baseRange(num.substring(1), up, false) + ")";

        if (up) low++; else high--;

        if (low <= high)
            re += "|" + charClass(low, high) + nDigits(num.length() - 1);

        return re;
    }

    private static String charClass(char b, char e) {
        return String.format(b==e ? "%c" : e-b>1 ? "[%c-%c]" : "[%c%c]", b, e);
    }

    private static String nDigits(int n) {
        return nDigits(n, n);
    }

    private static String nDigits(int n, int m) {
        return "[0-9]" + String.format(n==m ? n==1 ? "":"{%d}":"{%d,%d}", n, m);
    }

    private static String eqLengths(String from, String to) {

        char fc = from.charAt(0), tc = to.charAt(0);

        if (from.length() == 1 && to.length() == 1)
            return charClass(fc, tc);

        if (fc == tc)
            return fc + "("+rangeRegex(from.substring(1), to.substring(1))+")";

        String re = fc + "(" + baseRange(from.substring(1), true, false) + ")|"
                  + tc + "(" + baseRange(to.substring(1),  false, false) + ")";

        if (++fc <= --tc)
            re += "|" + charClass(fc, tc) + nDigits(from.length() - 1);

        return re;
    }    

    private static String nonEqLengths(String from, String to) {
        String re = baseRange(from,true,false) + "|" + baseRange(to,false,true);
        if (to.length() - from.length() > 1)
            re += "|[1-9]" + nDigits(from.length(), to.length() - 2);
        return re;
    }

    public static String rangeRegex(int n, int m) {
        return rangeRegex("" + n, "" + m);
    }

    public static String rangeRegex(String n, String m) {
        return n.length() == m.length() ? eqLengths(n, m) : nonEqLengths(n, m);
    }

}

Usage

// Generate expression for range 123 - 321
String regexp = NumericRangeRegexGenerator.rangeRegex(123, 321);


Explanation

A brief explanation of the code follows.

Ranges on the shape 0000-abcd and abcd-9999

First we note that matching ranges such as 0000-abcd is fairly easy.

An expression covering for instance 000-527 can be expressed as

  • [0-4] followed by two arbitrary digits, or
  • 5 followed by 00-27 (which is resolved recursively!)

Ranges on the shape 1000-abcd and abcd-9999 are just as easy.

Lower limit, upper limit of different lengths.

If the "from"-number is shorter than the "to"-number it is fairly straight forward.

Assume for instance that the from-number has 3 digits and the to-number has 7 digits. The expression can then be composed as follows:

  • from-999 (as described above),
  • Any 4, 5 or 6 digit number: [1-9][0-9]{3-5}, or
  • 1000000-to (as described above)

Lower limit / upper limit of equal lengths.

This is the trickiest situation (still not that tricky though!)

The solution is, again, best described by an example. Consider the range 273 - 548. The expression can be composed by the following parts:

  • 2 followed by 73-99 (latter part described above),
  • [34] followed by any two digits, or
  • 5 followed by 00-48 (latter part described above)
aioobe
  • 413,195
  • 112
  • 811
  • 826
1

Let me see if I understand this correctly. You have a file named a-NUMBER-b.txt. You need to check that the number is the correct number. Here's how that's done:

To check the number, assuming it is in the right format:

String name = getName();
int myInt = Integer.parseInt(name.split(a + "-")[1].split("-" + b + ".txt")[0]);

To check the format:

name.startsWith(a + "-") && name.endWith("-" + b + ".txt")

Let me know if I answered correctly, Ryan

Ryan Amos
  • 5,422
  • 4
  • 36
  • 56
  • Please note that a != b (or !a.equals(b) if you want to be picky) – Ryan Amos Jun 14 '11 at 19:58
  • 1
    this would be excellent, but i use a file event listener, that listens to files that are created in a dir. this is currently implemented with regex that validates if file x is one of the files that i need to get. so this resolution would force a change in the current implementation.. – AAaa Jun 14 '11 at 20:02
  • `split` is not a static method. `String.split` doesn't make sense. – aioobe Jun 14 '11 at 20:06
  • @dan In the event listener, you should be able to feed in your own instance of the listener to use by providing a class that implements the given listener. – Ryan Amos Jun 14 '11 at 20:07
  • @aioob oops... I meant name.split. Thanks! – Ryan Amos Jun 14 '11 at 20:08
  • well if this is what need to be done, i'll do it. i thought that a regex can be built, as complex as it can be, to solve this.. – AAaa Jun 14 '11 at 20:09
  • 1
    @dan, it can. But it won't be a simple algorithm. – aioobe Jun 14 '11 at 20:12
  • 1
    @dan, see my answer for such algorithm. – aioobe Jun 16 '11 at 09:15
0

Something like this might work, though not sure on the cost of the split.

File[] files = new File("foo").listFiles();
for( int i = 0; i < files.length; i++ )
{
    String fn = files[i].getName();
    String sl[] = fn.split( "-" );
    int num = Integer.parseInt( sl[1] );

    if( num >= min && num <= max )
         //do stuff
}
James DeRagon
  • 364
  • 1
  • 4
  • 11
  • This is susceptible to an injection attack in which a contains a '-'. For example, suppose the file name is "file-name-12345-file.txt". – Ryan Amos Jun 14 '11 at 19:58
  • 1
    I assumed from the question that the a and b part were unknown, and if that was the case having the delimiter as part of the value when parsing something like that just breaks it down anyway. I guess it really depends on what A and B are and how specific you need to be...but it looks like split won't solve the question anyhow. – James DeRagon Jun 14 '11 at 20:07
  • My implementation of split solves this. However, both of our implementations are susceptible to injection in the rejex, such as a \\ ("\\\\"). – Ryan Amos Jun 14 '11 at 20:19