Match if keyword inside a string starts/ends or both with non-alphanumeric words

Question

My requirement in simple plain English

Match if keyword inside a string starts/ends or both with non-alphanumeric words, or exact match

Keyword: china

'CHINA', #match
'CHINA ROM' #match
'CHINA WAREHOUSE', #match
'CHINA-WAREHOUSE', #match
'CHINA-ROM', #match
'dsa china', #match
'CHINALOCAL', #No
'CHINAOO' #No

As per my current knowledge of Regexes, I can do something like

keyword = keyword.lower()
if keyword == '' string \ 
    or re.match(r"china[^a-zA-Z0-9]", keyword, flags=re.IGNORECASE) \
    or re.match(r"[^a-zA-Z0-9]china", keyword, flags=re.IGNORECASE) \
    or re.match(r"[^a-zA-Z0-9]china[^a-zA-Z0-9]", keyword, flags=re.IGNORECASE):
    print("matched")

Is there any single regex expression that can perform all checks I want?

Do you mean words that do not contain a digit? Only `[a-zA-Z]`? — The fourth bird, Jul 07 '19 at 11:33
maybe it's duplicaated : https://stackoverflow.com/questions/29996079/match-a-whole-word-in-a-string-using-dynamic-regex — Ghassen, Jul 07 '19 at 11:33
@WiktorStribiżew is it possible you can write an answer with few details — Umair Ayub, Jul 07 '19 at 11:46
[I have written it](https://stackoverflow.com/a/29996092/3832970), no need to duplicate SO content. — Wiktor Stribiżew, Jul 07 '19 at 11:47
@WiktorStribiżew then I should delete my question, but I cannot because SO doesnt allow me to delete since it has answers — Umair Ayub, Jul 07 '19 at 11:48

score 1 · Answer 1 · answered Jul 07 '19 at 11:37

For your example data, you might use

^(?:[A-Za-z]+ )*china(?:[ -][A-Za-z]+)*$

^ Start of string
(?:[A-Za-z]+ )* Repeat 0+ times matching 1+ times A-Za-z followed by a space
china Match literally
?:[ -][A-Za-z]+)* Repeat 0+ times matching a space or -, then 1+ times A-Za-z
$ End of string

Regex demo

Andrej Kesely · Answer 2 · 2019-07-07T11:41:31.940

data = [
"'CHINA'",
"'CHINA ROM'",
"'CHINA WAREHOUSE'",
"'CHINA-WAREHOUSE'",
"'CHINA-ROM'",
"'dsa china'",
"'CHINALOCAL'",
"'CHINAOO'",
]

import re

for d in data:
    if re.findall(r'[^a-z]china[^a-z]', d, flags=re.I):
        print('{: <20} match!'.format(d))
    else:
        print('{: <20} not match!'.format(d))

Prints:

'CHINA'              match!
'CHINA ROM'          match!
'CHINA WAREHOUSE'    match!
'CHINA-WAREHOUSE'    match!
'CHINA-ROM'          match!
'dsa china'          match!
'CHINALOCAL'         not match!
'CHINAOO'            not match!

EDIT: As Wiktor said in the comments re.findall(r'\bchina\b', d, flags=re.I) might be what you want!

ah, nice trick to enclose words in single quotes :P – Umair Ayub Jul 07 '19 at 11:40 — Umair Ayub, Jul 07 '19 at 11:40

Match if keyword inside a string starts/ends or both with non-alphanumeric words

2 Answers2