2

I am interested in generating regular expressions using a combination of machine learning and/or evolutionary algorithms. My approach requires that I randomly construct potential regular expression strings that are evaluated by these algorithms.

Does anyone know of a context free grammar that says how regular expressions are allowed to be structured? A set of rules that if followed, can combine the items below into a feasible structure.

For example, using these sub-components:

basic_elements = {
        "Character Escapes": ["\a", "\b", "\t", "\r", "\v", "\f", "\n", "\e", "\ ", "\c", "\u"],

        "Character Classes": ["[group]", "[^ group]", "[first - last]", "\p{name}", "\w", "\s", "\S", "\d", "\D"],

        "Anchors": ["^", "$", "\A", "\Z", "\z", "\G", "\b", "\B"],

        "Grouping Constructs": ["(subexpression)", "(?< name > subexpression)",
                                "(?< name1 - name2 > subexpression)",
                                "(?: subexpression )", "(?imnsx-imnsx: subexpression )", "(?= subexpression )",
                                "(?! subexpression )", "(?<= subexpression )", "(?<! subexpression )",
                                "(?> subexpression )"],

        "Quantifiers": ["*", "+", "?", "{n, }", "{n, m}", "*?", "+?", "??", "{ n }?", "{ n , }?", "{ n , m }?"],

        "Backreference Constructs": ["\number", "\k< name >"],

        "Alternation Constructs": ["|", "(?( expression ) yes | no )", "(?( name ) yes | no )"],

        "Substitutions": ["$", "${name}", "$$", "$&", "$", "$`", "$'", "$+", "$_", "", "", "", ""],

        "Regular Expression Options": ['i', 'm', 'n', 's', 'x'],

        "Miscellaneous Constructs": ['(?imnsx-imnsx)', '(?# comment )', '#']

    }

Thanks in advance

Emma
  • 27,428
  • 11
  • 44
  • 69
tmele54
  • 21
  • 1

0 Answers0