get matched item from generator expression

Question

I have written an if condition with a generator expression.

self.keyword_list = ['Buzz', 'Heard on the street', 'familiar with the development', 'familiar with the matter', 'Sources' ,'source', 'Anonymous', 'anonymity', 'Rumour', 'Scam', 'Fraud', 'In talks', 'Likely to', 'Cancel', 'May', 'Plans to', 'Raids' ,'raid', 'search', 'Delisting', 'delist', 'Block', 'Exit', 'Cheating', 'Scouts', 'scouting', 'Default', 'defaulted', 'defaulter', 'Calls off', 'Lease out', 'Pick up', 'delay', 'arrest', 'arrested', 'inks', 'in race', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'to monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest', 'Buzz', 'Heard on the street', 'familiar with the development', 'familiar with the matter', 'Sources', 'source', 'Anonymous', 'anonymity', 'Rumour', 'Scam', 'Fraud', 'In talks', 'Likely to', 'Cancel', 'May', 'Plans to ', 'Raids', 'raid', 'search', 'Delisting', 'delist', 'Block', 'Exit', 'Cheating', 'Scouts','scouting', 'Default', 'defaulted', 'defaulter', 'Calls off', 'Lease out', 'Pick up', 'delay', 'arrest', 'arrested', 'inks', 'in race', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest']
if any(re.search(item.lower(), record['title'].lower()+' '+record['description'].lower()) for item in self.keyword_list):
    #for which value of item condition became true?
    #print item does not work
    print record

If condition is true, then I want to print that matching item name. How do I get this?

It is a generator expression. There are [no tuple comprehensions in Python](http://stackoverflow.com/questions/16940293/why-is-there-no-tuple-comprehension-in-python). Also, you don't need to use `else: pass`, that block is entirely optional and can just be omitted. — Martijn Pieters, Sep 10 '15 at 09:34

Martijn Pieters · Accepted Answer · 2015-09-10T10:08:03.577

1

Don't use any(), and change your generator expression to using a filter (move the test to the end), then use next() to get the first match:

matches = (item for item in self.keyword_list if re.search(item.lower(), record['title'].lower() + ' ' + record['description'].lower()))
first_match = next(matches, None)
if first_match is not None:
    print record

Or you could just use a for loop and break out after the first match:

for item in self.keyword_list:
    if re.search(item.lower(), record['title'].lower() + ' ' + record['description'].lower()):
        print record
        break

You could further clean any of these variants up by pre-computing the regular expression to match, and using the re.IGNORECASE flag so you don't have to lowercase everything:

pattern = re.compile(
    '{} {}'.format(record['title'], record['description']),
    flags=re.IGNORECASE)
matches = (item for item in self.keyword_list if pattern.search(item))
first_match = next(matches, None)
if first_match is not None:
    print record

or

pattern = re.compile(
    '{} {}'.format(record['title'], record['description']),
    flags=re.IGNORECASE)
for item in self.keyword_list:
    if pattern.search(item):
        print record
        break

edited Sep 10 '15 at 10:08

answered Sep 10 '15 at 09:36

Martijn Pieters

1,048,767
296
4,058
3,343

@Martijin: thanks a lot, strugling with `AttributeError: 'module' object has no attribute 'IGNORE'` – cyclic Sep 10 '15 at 09:57
I added `flags= re.IGNORECASE` which then gave `UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 97: ordinal not in range(128` – cyclic Sep 10 '15 at 10:03
@cyclic: sorry, I mis-remembered the flag. If you are using Unicode objects then use `u'{} {}'.format()` instead (so use a unicode literal for the formatting string), and use `flags=re.IGNORECASE | re.UNICODE`. – Martijn Pieters Sep 10 '15 at 10:13
@Martijin: thanks but stil same error. Is this correct? `re.compile('{} {}'.format(record['title'].encode('utf-8').strip(), record['description'].encode('utf-8').strip()),flags=re.IGNORECASE | re.UNICODE)` – cyclic Sep 10 '15 at 10:18
@cyclic: your items in your keyword list are Unicode too, presumably? I would *not* encode to UTF-8, because now you are matching UTF-8 bytes, not characters. That can lead to really weird results and you lose the ability to match case insensitive for anything but plain ASCII characters. – Martijn Pieters Sep 10 '15 at 10:24
@Martijin: no it's not unicode, just plain text. I have updated the keyword list – cyclic Sep 10 '15 at 10:25
@cyclic: it is hard to debug your issue without a full traceback. Can you create a pastie or gist with that perhaps? – Martijn Pieters Sep 10 '15 at 10:26
@cyclic: also note that those Unicode issues are *entirely separate* from your current question. Your base question has been answered already, this is an unrelated issue. :-) – Martijn Pieters Sep 10 '15 at 10:27
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/89264/discussion-between-cyclic-and-martijn-pieters). – cyclic Sep 10 '15 at 10:29
@cyclic: and last but not least: don't confuse a `UnicodeEncodeError` with a `UnicodeDecodeError`. You got an encoding error at first, you are certain you are not getting a *decode* error now, right? Oh, and encoding to UTF-8 makes the `re.UNICODE` flag useless, you are now using *bytes*, not Unicode, so if you insist on encoding to UTF-8 then the flag can go. – Martijn Pieters Sep 10 '15 at 10:29

get matched item from generator expression

1 Answers1