0

I am doing an online Python course and became stuck on this exercise (sorry for the auto-translate, I am not good in English):

Consider the task of checking spam in an e-mail or filtering prohibited words in a message.

Let the is_spam_words function accept a string (parameter text), check it for the content of prohibited words from the list (parameter spam_words), and return the result of the check: True, if there is at least one word from the list, and False, if no stop words are found in the text.

The words in the text parameter can be in any case, which means that the is_spam_words function, when searching for prohibited words, is case-independent and all text must be lowercase. For simplicity, let's assume that there is only one forbidden word in the line.

Provide a third parameter space_around to the function, which defaults to False. It will be responsible for whether the function will search for a single word or not. A word is considered to stand alone if there is a space symbol to the left of the word or it is located at the beginning of the text, and there is a space or a period symbol to the right of the word.

For example, we are looking for the word "rain" in the text. So in the word "rainbow" the challenge and result will be as follows:

print(is_spam_words("rainbow", ["rain"])) # True
print(is_spam_words("rainbow", ["rain"], True)) # False

That is, in the second case, the word is not separate and is part of another.

In this example, the function will return True in both cases.

print(is_spam_words("rain outside.", ["rain"])) # True
print(is_spam_words("rain outside.", ["rain"], True)) # True

My code is:

def is_spam_words(text, spam_words, space_around=False):
    if space_around:
        text = text.lower()
        for char in spam_words:
            char.lower()
        if text.find(spam_words) != -1 or text.startswith(spam_words):
            return True
        else:
            return False
    return False

The problem is:

The function returned an incorrect result for two parameters: False. Must be

is_spam_words('rain outside.', ['rain']) == True

I have tried changing the main loops (like example1: if spam_words in text is True or example2: if (" " + spam_word + " ") in (" " + text + " ")), but I still do not understand why it is not working. I am expecting, that if spam word is found the word/text, the result will be True and False if it is not found.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Lin
  • 29
  • 4
  • If `space_around = False` (which it is by default), then what happens? The main part of the function never even runs, it just goes straight to `return False`. Check out [How to debug small programs](//ericlippert.com/2014/03/05/how-to-debug-small-programs/) by Eric Lippert. See also [mre], which has more related tips. BTW, welcome back to Stack Overflow! Check out the [tour] and [How to ask a good question](/help/how-to-ask). – wjandrea Aug 24 '23 at 20:42
  • 1
    FWIW, my first thought for how to solve this was to use regex, but I guess you haven't gotten there yet. That might even be the next module if they're like, "That was difficult, now here's the easier way". – wjandrea Aug 24 '23 at 20:45
  • 1
    BTW, `char.lower()` doesn't do anything on its own. See [Why doesn't calling a string method (such as .replace or .strip) modify (mutate) the string?](/q/9189172/4518341) – wjandrea Aug 24 '23 at 20:48

1 Answers1

1
  1. Convert everything to lowercase to ensure case-insensitivity.
  2. If space_around is False, simply check if the spam word is in the text.
  3. If space_around is True, we'll check if the word is at the start of the text and is followed by a space or a period, if the word is in the middle and is surrounded by spaces or ends with a period, if the word is at the end of the text and is preceded by a space. Here's the solution based on the above approach:
def is_spam_words(text, spam_words, space_around=False):
    # Convert the text and spam words to lowercase
    text = text.lower()
    spam_words = [word.lower() for word in spam_words]

    for word in spam_words:
        if not space_around:
            if word in text:
                return True
        else:
            if text.startswith(word + " ") or text.endswith(" " + word) or text.endswith(" " + word + "."):
                return True
            if (" " + word + " ") in text:
                return True
            if (" " + word + ".") in text:
                return True

    return False

# Testing your examples
print(is_spam_words("rainbow", ["rain"]))  # True
print(is_spam_words("rainbow", ["rain"], True))  # False
print(is_spam_words("rain outside.", ["rain"]))  # True
print(is_spam_words("rain outside.", ["rain"], True))  # True
print(is_spam_words('Moloch god is terrible.', ['shit']))  # False
Amrit Baveja
  • 528
  • 7
  • 13
  • If I interpret correctly, this code will fail a testcase `is_spam_words('foo.', ['foo'], True)`. `text.endswith(" " + word + ".")` is redundant. Listcomp is inefficient (you can use generator instead). If `not space_around`, you could `return any(word in text for word in spam_words)` to save on in-python looping. – STerliakov Aug 24 '23 at 20:48
  • `text.endswith(" " + word + ".")` is redundant of `(" " + word + ".") in text` – wjandrea Aug 24 '23 at 21:03