This is quite hard to follow (in my opinion), so I was wondering if anyone can suggest something more pythonic with shorter lines?
Generally pythonic doesn't mean shorter lines. Pythonic code could should be easy to read and follow (at least with a bit of background). So if you find that hard to read you could factor it into a different function:
# I'm not sure if the function name is a good fit, it's just a suggestion.
def contains_at_least_one(data, words):
for word in words:
if word in data:
return True
return False
def is_important(data_row):
if data_row.get('important', None):
return True
add_data = get_additional_data(data_row).lower()
if contains_at_least_one(add_data, NEGATIVE_WORDS):
return False
if contains_at_least_one(add_data, POSITIVE_WORDS):
return True
return False
Could I for example merge the two for loops?
Not really. Because the NEGATIVE_WORDS
loop should take precedence (at least in your code) over the POSITIVE_WORDS
loop. Except you meant factoring it in a function. Then see first.
If I merge two for loops, would it consumes more time?
I'm not sure what you meant by "merging" the loops but if you want it shorter you could use any
in the approach above. It's equivalent to the for
-loop and shorter - but, according to my and StefanPochmans benchmarks, slower:
def contains_at_least_one(data, words):
return any(word in data for word in words)
def is_important(data_row):
if data_row.get('important', None):
return True
add_data = get_additional_data(data_row).lower()
if contains_at_least_one(add_data, NEGATIVE_WORDS):
return False
if contains_at_least_one(add_data, POSITIVE_WORDS):
return True
return False
You could even reduce the number of lines by using and
for return
. I would not recommend it because such constructs don't improve the readability, but that's your decision and it's one way to "shorten" code:
def is_important(data_row):
if data_row.get('important', None):
return True
add_data = get_additional_data(data_row).lower()
return (not contains_at_least_one(add_data, NEGATIVE_WORDS) and
contains_at_least_one(add_data, POSITIVE_WORDS))
A bit far fetched but maybe you could even use set
s to speed this up. This would require that you only look for whole word matches (not partial matches, not multi-word matches):
def contains_at_least_one(data, words):
return data.intersection(words)
def is_important(data_row):
if data_row.get('important', None):
return True
add_data = set(get_additional_data(data_row).lower().split()) # set and split!
return not contains_at_least_one(add_data, NEGATIVE_WORDS) and contains_at_least_one(add_data, POSITIVE_WORDS)
See also the regex suggestions in tobias_k's answer if you don't want punctuation to break your matching. However the set approach is only meant as "small suggestion" - I doubt it could be applied in your case. But you need to judge that.