2

I have the following problem in python, which I hope you can assist with.

The input is 2 regular expressions, and I have to check if their concatenation can have values. For example if one says take strings with length greater than 10 and the other says at most 5, than no value can ever pass both expressions.

Is there something in python to solve this issue?

Thanks, Max.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Max Shifrin
  • 809
  • 9
  • 24

3 Answers3

0

Is there something in python to solve this issue?

There is nothing in Python that solves this directly.

That said, you can simulate a logical-and operation for two regexes by using lookahead assertions. There is a good explanation with examples at Regular Expressions: Is there an AND operator?

This will combine the regexes but won't show directly whether some string exists that satisfies the combined regex.

Community
  • 1
  • 1
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
0

Getting this brute force algorithm from here: Generating a list of values a regex COULD match in Python

def all_matching_strings(alphabet, max_length, regex1, regex2):
"""Find the list of all strings over 'alphabet' of length up to 'max_length' that match 'regex'"""

if max_length == 0: return 

L = len(alphabet)
for N in range(1, max_length+1):
    indices = [0]*N
    for z in xrange(L**N):
        r = ''.join(alphabet[i] for i in indices)
        if regex1.match(r) and regex2.match(r):                
           yield(r)

        i = 0
        indices[i] += 1
        while (i<N) and (indices[i]==L):
            indices[i] = 0
            i += 1
            if i<N: indices[i] += 1

return

example usage, for your situation (two regexes)... you'd need to add all possible symbols/whitespaces/etc to that alphabet also...:

alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890'
import re
regex1 = re.compile(regex1_str)
regex2 = re.compile(regex1_str)
for r in all_matching_strings(alphabet, 5, regex1, regex2): 
    print r

That said, the runtime on this is super-crazy and you'll want to do whatever you can to speed it up. One suggestion on the answer I swiped the algorithm from was to filter the alphabet to only have characters that are "possible" for the regex. So if you scan your regex and you only see [1-3] and [a-eA-E], with no ".", "\w", "\s", etc, then you can reduce the size of the alphabet to 13 length. Lots of other little tricks you could implement as well.

Community
  • 1
  • 1
sdanzig
  • 4,510
  • 1
  • 23
  • 27
  • This doesn't seem to be to usefull because of runtime exponential. Obviously I can't have a supercomputer run this every time I need to valudate the rejexes – Max Shifrin Nov 11 '13 at 12:03
0

I highly doubt that something like this is implemented and even that there is a way to efficiently compute it.

One approximative way that comes to my mind now, that detects the most obvious conflicts would be to generate a random string conforming to each of the regexes, and then check if the concatenation of the regexes matches the concatenation of the generated strings.

Something like:

import re, rstr
s1 = rstr.xeger(r1)
s2 = rstr.xeger(r2)
print re.match(r1 + r2, s1 + s2)

Although I can't really think of a way for this to fail. In my opinion, for your example, where r1 matches strings with more than 10 chars, r2 matches strings shorter than 5 chars, then the sum of the two would yield strings with the first part longer than 10 and a tail of less than 5.

Iulius Curt
  • 4,984
  • 4
  • 31
  • 55
  • I can't seem to think of an example of the top of my head either, but this doesn't seem like a good way to prove that something works. In my example the check I need is supposed to fail for r1+ r2 and there is no string which satisfies it – Max Shifrin Nov 11 '13 at 12:07
  • I am doubting that the regular expressions have the representational power to create assertions that could clash on concatenation. – Iulius Curt Nov 11 '13 at 12:50