2

Suppose I have an arbitrary regular expression. How I could calculate the length of string required for a match?

Examples (regex => minimum length of matchable string):

  1. [0-9]{3},[0-9]{2} => 6
  2. [0-9]{4},[0-9]{2} => 7
  3. [0-9]{2}.[0-9]{3}.[0-9]{3}/[0-9]{4}-[0-9]{2} => 17
  4. [0-9]{3}.[0-9]{3}.[0-9]{3}-[0-9]{2} => 14
  5. [0-9]{2}/[A-Z]{2}/[0-9]{4} => 10

I also need a function which take as parameter a regex and a integer number between 1 and the size calculated with the function above (like position(regex, number)), and return what the type of the character in that position (number, letter or symbol).

Examples:

  • Example 1: Position 3 is a "number"
  • Example 2: Position 3 is a "symbol"
  • Example 5: Position 4 is a "letter"

UPDATE

the objective here is implement this:

function size_of(regex) {
    //
}

function type_of(regex, posicao) {
    //
}

function generate_string(tamanho) {
    //
}

$(document).on('.valida', 'focus', function(){
    var regex = $(this).attr('pattern');

    var counter = 0;
    var tam = size_of(regex);
    var str = generate_string(tam);

    $(this).val(str);
    $(this).keypress(function(event){
        var tecla = e.which;

        if(typeof tecla == type_of(regex, counter)){
            str = str + tecla;
            counter++;
        }

        $(this).val(str);
    });
});

UPDATE 2

some examples that would be useful:

1-> calculate the lengh: http://js.do/code/38693 (just need be more generic).

UPDATE 3 - FINAL CODE

the final code for the script above is that:

jsfiddle

http://jsfiddle.net/klebermo/f8U4c/78/

code

function parse(regexString){
    var regex = /((?!\[|\{).(?!\]|\}))|(?:\[([^\]]+)\]\{(\d+)\})/g,
        match,
        model = [];
    while (match = regex.exec(regexString)) {
        if(typeof match[1] == 'undefined'){
            for(var i=0;i<match[3];i++){
                model.push(match[2]);
            }
        }else{
            model.push(match[1]);
        }
    }
    return model;
}

function replaceAt(s, n, t) {
    return s.substring(0, n) + t + s.substring(n + 1);
}

function size_of(regex) {
    var parsedRegexp = parse(regex);
    return parsedRegexp.length;
}

function type_of(regex, posicao) {
    var parsedRegexp = parse(regex);
    var pos = parsedRegexp[posicao];

    if(pos == '0-9')
        return 'number';

    if(pos == 'A-Z' || pos == 'a-z')
        return 'string';

    return pos;
}

function generate_string(regex, tamanho) {
    var str = '';

    for(var i=0; i<tamanho; i++) {
        var type = type_of(regex, i);
        if(type == 'number' || type == 'string')
            str = str + '_';
        else
            str = str + type;
    }

    return str;
}

var counter;
var tam;
var str;
var regex;

$('.valida').each(function(){

    $(this).on('focus', function(e){
        regex = $(this).attr('pattern');

        counter = 0;
        tam = size_of(regex);
        str = generate_string(regex, tam);

        $(this).val(str);
    });

    $(this).on('keypress', function(e){
        e.preventDefault();

        var tecla = e.which;

        if(tecla >= 48 && tecla <= 57)
            var tecla2 = tecla - 48;
        else
            var tecla2 = String.fromCharCode(tecla);

        result = $("<div>");
        result.append( "tecla = "+tecla+"<br>" );

        var t = type_of(regex, counter);

        if(counter < tam) {
            if(t != 'number' && t != 'string') {
                str = replaceAt(str, counter, t);
                counter++;
            }

            t = type_of(regex, counter);

            if(typeof tecla2 == t) {
                result.append( "tecla2 = "+tecla2+"<br>" );
                str = replaceAt(str, counter, tecla2);
                counter++;
            }
        }

        result.append( "counter = "+counter+"<br>" );
        $("#result").empty().append(result);

        $(this).val(str);
    });

});
Kleber Mota
  • 8,521
  • 31
  • 94
  • 188
  • 1
    I failed to understand where the length value comes from – Dalorzo Jun 01 '14 at 01:31
  • I want calculate that based on the pattern determined by the regex. – Kleber Mota Jun 01 '14 at 01:35
  • the symbol is counted too – Kleber Mota Jun 01 '14 at 01:37
  • For example #1 (`[0-9]{3},[0-9]{2}`), any matching string would have to be 6 characters long (e.g. `000,00`). Likewise, for example #5 (`[0-9]{2}/[A-Z]{2}/[0-9]{4}`), any matching string would have to be 10 characters long (e.g. `00/AA/0000`). – Tyler Eich Jun 01 '14 at 01:37
  • Ahh I thought it was referring to the matching numbers now it makes sense :D – Dalorzo Jun 01 '14 at 01:39
  • But what about regexes with .*, ie, ones that can match arbitrarily long strings? What is the expected output in that case? – jithinpt Jun 01 '14 at 01:41
  • 1
    So basically you want a regex parser that parses the pattern itself, and generates information about it … What’s the actual use case here? – CBroe Jun 01 '14 at 01:42
  • 1
    I think it would be easier to get the length of the string tested using the regex rather than calculate the length of the regex. – Mottie Jun 01 '14 at 01:44
  • It will be great if you share the background of the problem and the desired output. Maybe a bit slower than usual but it is not clear to me what is intended – Dalorzo Jun 01 '14 at 01:45
  • @Dalorzo it's a script for input validation for the views of my project. Some fields from my forms will have a attribute pattern I want read and use to validate the input in the same time the user types. – Kleber Mota Jun 01 '14 at 01:45
  • 1
    Maybe I understood this incorrectly but if you have the regexes beforehand, wouldn't it be easier to just manually map each regex to the expected string length unless there is an excessive number of regexes to keep track of? – jithinpt Jun 01 '14 at 01:47
  • @jithinpt but I want validate the user input when he types, and display in the field a "template" for the input (ex.: `__/__/____` or `___.___.___-__`) – Kleber Mota Jun 01 '14 at 01:49
  • What happens with this kind of regex `\s+@\s+.\s{2,3}` ? – Florian F. Jun 01 '14 at 01:52
  • @FlorianF. I need handle only regex which match a string with a limited size. In the case you present, the script could ignore the validation. – Kleber Mota Jun 01 '14 at 01:55

3 Answers3

3

I've made a little parser for simple regex like the ones you're using.

It basically creates an array for each expected character with the type of character (0-9, A-Z) or the character itself.

function parse(regexString){
    var regex = /((?!\[|\{).(?!\]|\}))|(?:\[([^\]]+)\]\{(\d+)\})/g,
        match,
        model = [];
    while (match = regex.exec(regexString)) {
        if(typeof match[1] == 'undefined'){
            for(var i=0;i<match[3];i++){
                model.push(match[2]);
            }
        }else{
            model.push(match[1]);
        }
    }
    return model;
}

And jsfiddle to demo.

About the regex used inside the parse method, a debuggex schema will explain it better than i could do :

((?!\[|\{).(?!\]|\}))|(?:\[([^\]]+)\]\{(\d+)\})

Regular expression visualization

Also, you can get total number of characters through :

myresult.length;

And the type of the n-th character through :

myresult[n];
Florian F.
  • 4,700
  • 26
  • 50
1

I believe a generic solution to this problem would involve implementing a function that generates a finite state automaton object corresponding to each regex.

This SO post seems related to the question at hand.

Also check out this link: (C# Code to generate strings that match a regex)

Community
  • 1
  • 1
jithinpt
  • 1,204
  • 2
  • 16
  • 33
  • I guess generate a random string will help to acomplish what I want. But in the link you indicate me all the examples are in php or perl (which I don't know almost anything). Is there any way to do that in jquery or javascript? – Kleber Mota Jun 01 '14 at 02:00
  • Could you post a list of all the regexes you need to match on? If there is a common pattern among them, that can be used to write a solution specifically for your use case – jithinpt Jun 01 '14 at 02:07
  • I don't have a common pattern, the script should be the more generic possible. For implement this, I just need to know how to do what I ask in the question (length of string and type of character). When I do this, I can post here the final code to evaluation. – Kleber Mota Jun 01 '14 at 02:10
  • this example would be a good start point, junt need be more generic: http://js.do/code/38693 – Kleber Mota Jun 01 '14 at 02:11
1

In your specific case, you could try this code (demo):

var basicRegexLength = function(regex){
    var i;
    regex = regex.replace(/(\[0-9\]|\[A-Z\])/gi, '');
    for (i = 1; i < 10; i++) {
        regex = regex.replace( new RegExp('\\{' + i + '\\}', 'g'), Array(i+1).join('.') );
    }
    return regex.length;
};
Mottie
  • 84,355
  • 30
  • 126
  • 241