1

I need to match all upper case letters in a string, but not duplicates of the same letter in python I've been using

from re import compile

regex = compile('[A-Z]')
variables = regex.findall('(B or P) and (P or not Q)')

but that will match ['B', 'P', 'P', 'Q'] but I need ['B', 'P', 'Q'].

Thanks in advance!

amxiao
  • 35
  • 2
  • Do you want to not match strings with duplicates, or you just want to filter duplicates from your results? If the latter, use a `set`. – Patrick Haugh Sep 24 '18 at 03:30

2 Answers2

3

You can use negative lookahead with a backreference to avoid matching duplicates:

re.findall(r'([A-Z])(?!.*\1.*$)', '(B or P) and (P or not Q)')

This returns:

['B', 'P', 'Q']
blhsing
  • 91,368
  • 6
  • 71
  • 106
0

And if order matters do:

print(sorted(set(variables),key=variables.index))

Or if you have the more_itertools package:

from more_itertools import unique_everseen as u
print(u(variables))

Or if version >= 3.6:

print(list({}.fromkeys(variables)))

Or OrderedDict:

from collections import OrderedDict
print(list(OrderedDict.fromkeys(variables)))

All reproduce:

['B', 'P', 'Q']
U13-Forward
  • 69,221
  • 14
  • 89
  • 114