2

I would like to generate string matching my regexes using Python 3. For this I am using handy library called rstr.

My regexes:

  • ^[abc]+.
  • [a-z]+

My task:

I must find a generic way, how to create string that would match both my regexes.

What I cannot do:

Modify both regexes or join them in any way. This I consider as ineffective solution, especially in the case if incompatible regexes:

import re
import rstr

regex1 = re.compile(r'^[abc]+.')
regex2 = re.compile(r'[a-z]+')

for index in range(0, 1000):
  generated_string = rstr.xeger(regex1)
  if re.fullmatch(regex2, generated_string):
    break;
else:
  raise Exception('Regexes are probably incompatibile.')

print('String matching both regexes is: {}'.format(generated_string))

Is there any workaround or any magical library that can handle this? Any insights appreciated.

Questions which are seemingly similar, but not helpful in any way:

Asker already has the string, which he just want to check against multiple regexes in the most elegant way. In my case we need to generate string in a smart way that would match regexes.

Community
  • 1
  • 1
Fusion
  • 5,046
  • 5
  • 42
  • 51
  • Does this help: https://stackoverflow.com/questions/8888567/match-a-line-with-multiple-regex-using-python – SanRyu Dec 04 '18 at 11:02
  • I seen that. It does not help, because the Asker has already the string, which he just want to check against multiple regexes. In my case we need to generate string in a smart way that would match regexes. – Fusion Dec 04 '18 at 11:05
  • I don't understand why you can't `join [the regexes] in any way`. Why won't `combined = '(={}){}'.format(regex1, regex2)` do? – L3viathan Dec 04 '18 at 11:18

3 Answers3

1

If you want really generic way, you can't really use brute force approach.

What you look for is create some kind of representation of regexp (as rstr does through call of sre_parse.py) and then calling some SMT solver to satisfy both criteria.

For Haskell there is https://github.com/audreyt/regex-genex which uses Yices SMT solver to do just that, but I doubt there is anything like this for Python. If I were you, I'd bite a bullet and call it as external program from your python program.

MacHala
  • 2,159
  • 1
  • 15
  • 18
0

I don't know if there is something that can fulfill your needs much smother. But I would do it something like (as you've done it already):

  1. Create a Regex object with the re.compile() function.
  2. Generate String based on 1st regex.
  3. Pass the string you've got into the 2nd regex object using search() method.
  4. If that passes... your done, string passed both regexs.

Maybe you can create a function and pass both regexes as parameters and test "2 by 2" using the same logic.

And then if you have 8 regexes to match... Just do:

call (regex1, regex2)
call (regex2, regex3)
call (regex4, regex5)
...
SanRyu
  • 210
  • 1
  • 2
  • 13
0

I solved this using a little alternative approach. Notice second regex is basically insurance so only lowercase letters are generated in our new string.

I used Google's python package sre_yield which allows charset limitation. Package is also available on PyPi. My code:

import sre_yield
import string

sre_yield.AllStrings(r'^[abc]+.', charset=string.ascii_lowercase)[0]
# returns `aa`
Fusion
  • 5,046
  • 5
  • 42
  • 51