0

Actually I have 2 list, one which has some terms(100k+) and the other list with brand names(10k+) in it. I want to fetch only those terms which has a brand name associated to it. For eg:

terms=['chocolates','nestle chocolates','bar','cadbury bar','refrigerator','samsung refrigerator','era clothing','grilling machine']

brands=['imperial brand','gems','era','cadbury','samsung','nestle','grill']

my code-->

matching = [t for t in terms if any(bt in t for bt in brands)]

Expected Output-->

['nestle chocolates','cadbury bar','samsung refrigerator', 'era clothing']

My output-->

['nestle chocolates',
 'cadbury bar',
 'refrigerator',
 'samsung refrigerator',
 'era clothing',
 'grilling machine']

I don't want terms like refrigerator or grilling machine to be a part of my list since these doesn't contain any brand names but they pop up in the list because of ERA n GRILL.

Can someone please help to achieve this.

It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
  • So you want to check for spaces around the brand name then? `if any(' ' + bt in t for bt in brands) or any(bt + ' ' in t for bt in brands)` – Peter Jun 01 '21 at 16:12
  • Or even `if any((' ' + bt int t or bt + ' ' in t) for bt in brands)` to avoid traversing the brand list twice – Altareos Jun 01 '21 at 16:14
  • How about using regex? `[t for t in terms if any(re.match(rf'\b{re.escape(bt)}\b', t) for bt in brands)]`. Note that `re.match` only matches at the beginning of the string. To match the word anywhere in the string use `re.search`. – Ali Samji Jun 01 '21 at 16:21
  • ```print([x for x in terms if x.split()[0] in brands])``` will give you the expected output. Basically check the word in ```brands``` and if exists, then add it to list. –  Jun 01 '21 at 16:23
  • @AliSamji, thanks for the help, this completely serves my purpose, just one question, cant we only use re.search(r'\b(bt)\b', t) instead of using escape and what is the actual purpose of using an f string and re.escape. Thanks again in advance! – Priyam Singh Jun 10 '21 at 14:13
  • Using `re.escape` is safer. In the example you posted, I don't see any benefit from it. However, if any of the brand names had a regex special character in it, you would need to escape it so you don't run into an issue. – Ali Samji Jun 10 '21 at 20:31

0 Answers0