0

I am new to regex and I want to know how to generate a pattern with letters including special characters and Capital letters from 3 letters up.

Suppose I have a string like this:

my_string = 'Syrians/NORP, Turkish/NORP, Turkish/NORP, Turkish/NORP, the last 2 , 3 years/DATE, Turkey/LOC'

What I have tried:

my_new_string = re.findall('[\w+\,]+/[A-Z]{4}', my_string)
#result
['Syrians/NORP', 'Turkish/NORP', 'Turkish/NORP', 'Turkish/NORP', 'years/DATE']

Expected result:

['Syrians/NORP', 'Turkish/NORP', 'Turkish/NORP', 'Turkish/NORP', 'the last 2 , 3 years/DATE', 'Turkey/LOC']

I also struggled with the pattern of capital letters from 3 or up.

Can you propose a good solution? Thanks in advance!

Schrodinger
  • 39
  • 1
  • 9

1 Answers1

2
>>> re.findall(r'\w[\w, ]+/[A-Z]{3,4}', my_string)
['Syrians/NORP', 'Turkish/NORP', 'Turkish/NORP', 'Turkish/NORP', 'the last 2 , 3 years/DATE', 'Turkey/LOC']

just add space to your character class (where the '+' is not needed after \w), and range from 3 to 4 to match "LOC" (or whatever range you need). Start with an alphanum to avoid matching leading spaces (which also matches _ btw but not a problem here)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219