1

I'm trying to match words with letters that have some constant alphabetical distance between them, for example, I'd like to find all words with A.*B, B.*C, C.*D, etc. I'm currently using the 're' package in Python 3.

Is there some way to do this without specifying all tuples of letters of distance 1 apart? I.e. not (A.*B|B.*C|C.*D|....|Y.*Z).

I'm looking for some robust solution which will work in more complex cases which require many groups (if it even exists).

Shay
  • 11
  • 1

1 Answers1

0

The ordinal of 'A' is 97 and 'B' is 98, etc. You can define a list comprehension that can do it for you. For example

distance = 1
regex = '(' + '|'.join("{}.*{}".format(chr(x), chr(x+distance)) for x in range(ord('A'), ord('Z')+1-distance)) + ')'
print(regex)
>>> (A.*B|B.*C|C.*D|D.*E|E.*F|F.*G|G.*H|H.*I|I.*J|J.*K|K.*L|L.*M|M.*N|N.*O|O.*P|P.*Q|Q.*R|R.*S|S.*T|T.*U|U.*V|V.*W|W.*X|X.*Y|Y.*Z)
distance = 2
regex = '(' + '|'.join("{}.*{}".format(chr(x), chr(x+distance)) for x in range(ord('A'), ord('Z')+1-distance)) + ')'
print(regex)
>>> (A.*C|B.*D|C.*E|D.*F|E.*G|F.*H|G.*I|H.*J|I.*K|J.*L|K.*M|L.*N|M.*O|N.*P|O.*Q|P.*R|Q.*S|R.*T|S.*U|T.*V|U.*W|V.*X|W.*Y|X.*Z)
Arjun
  • 5,978
  • 3
  • 12
  • 10
  • Thanks! This way isn't what I'm looking for, since it's not robust; I need to write a new list comprehension for other forms of the same problem, like A.*B.*C. Maybe it can be solved using 'eval()' for the general case (I.e., for any number of distances), but I'm wondering if there exists simpler solution. – Shay Nov 12 '18 at 19:55