0

My requirement in simple plain English

Match if keyword inside a string starts/ends or both with non-alphanumeric words, or exact match

Keyword: china

'CHINA', #match
'CHINA ROM' #match
'CHINA WAREHOUSE', #match
'CHINA-WAREHOUSE', #match
'CHINA-ROM', #match
'dsa china', #match
'CHINALOCAL', #No
'CHINAOO' #No

As per my current knowledge of Regexes, I can do something like

keyword = keyword.lower()
if keyword == '' string \ 
    or re.match(r"china[^a-zA-Z0-9]", keyword, flags=re.IGNORECASE) \
    or re.match(r"[^a-zA-Z0-9]china", keyword, flags=re.IGNORECASE) \
    or re.match(r"[^a-zA-Z0-9]china[^a-zA-Z0-9]", keyword, flags=re.IGNORECASE):
    print("matched")

Is there any single regex expression that can perform all checks I want?

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146

2 Answers2

1

For your example data, you might use

^(?:[A-Za-z]+ )*china(?:[ -][A-Za-z]+)*$
  • ^ Start of string
  • (?:[A-Za-z]+ )* Repeat 0+ times matching 1+ times A-Za-z followed by a space
  • china Match literally
  • ?:[ -][A-Za-z]+)* Repeat 0+ times matching a space or -, then 1+ times A-Za-z
  • $ End of string

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1
data = [
"'CHINA'",
"'CHINA ROM'",
"'CHINA WAREHOUSE'",
"'CHINA-WAREHOUSE'",
"'CHINA-ROM'",
"'dsa china'",
"'CHINALOCAL'",
"'CHINAOO'",
]

import re

for d in data:
    if re.findall(r'[^a-z]china[^a-z]', d, flags=re.I):
        print('{: <20} match!'.format(d))
    else:
        print('{: <20} not match!'.format(d))

Prints:

'CHINA'              match!
'CHINA ROM'          match!
'CHINA WAREHOUSE'    match!
'CHINA-WAREHOUSE'    match!
'CHINA-ROM'          match!
'dsa china'          match!
'CHINALOCAL'         not match!
'CHINAOO'            not match!

EDIT: As Wiktor said in the comments re.findall(r'\bchina\b', d, flags=re.I) might be what you want!

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91