2

I have list of hostnames. They are represented by region code.

  1. AP- : APAC
  2. EM- : EMEA
  3. AM- : Americas

Actual list has around 1000 hostnames and idea is to filter out hostname with no region code in them. I am able to filter it out by string manipulation etc however I am wondering how could I write an efficient regex to filter the hostname with no region codes in it ( like last 4 item in the list ) ?

 import re

host_name = ["XXX_Guangzhou_AP-CN-BEI-7517","XXX_Jakarta_AP-ID-JAK-0001","XXX_TaiPei_AP-TW-TPE-0002","XXX_Dubai_EM-AE-DUB-1012",
"XXX_Viladecans_EM-ES-VIL-1002","XXX_Ringsted_EM-DK-RIN-0001","XXX_Bogota_AM-CO-BOG-1033","XXX_Hamburg_EM-DE-HAM-1004",
"XXX_Bangkok_TH127","XXX_Bangkok_TH124","XXX_Eagan_6231","XXX_Martinez_AR218"]

hostRegex = re.compile(r"[^(AP\-|EM\-|AM\-)]")
mo = list(filter(hostRegex.findall,host_name))
print(mo)

1 Answers1

1

You can use

hostRegex = re.compile(r"_(A[PM]|EM)-")
mo = list(filter(lambda x: not hostRegex.search(x),host_name))

The _(A[PM]|EM)- regex matches _, then AP, AM or EM, and then a - char.

The filter(lambda x: not hostRegex.search(x),host_name) part returns all items in the host_name list that have no match.

See the Python demo:

import re

host_name = [
    "XXX_Guangzhou_AP-CN-BEI-7517","XXX_Jakarta_AP-ID-JAK-0001","XXX_TaiPei_AP-TW-TPE-0002",
    "XXX_Dubai_EM-AE-DUB-1012","XXX_Viladecans_EM-ES-VIL-1002","XXX_Ringsted_EM-DK-RIN-0001",
    "XXX_Bogota_AM-CO-BOG-1033","XXX_Hamburg_EM-DE-HAM-1004", "XXX_Bangkok_TH127","XXX_Bangkok_TH124",
    "XXX_Eagan_6231","XXX_Martinez_AR218"]

hostRegex = re.compile(r"_(A[PM]|EM)-")
mo = list(filter(lambda x: not hostRegex.search(x),host_name))
print(mo)

Output:

['XXX_Bangkok_TH127', 'XXX_Bangkok_TH124', 'XXX_Eagan_6231', 'XXX_Martinez_AR218']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563