0

For the assignment, we manually use stopwords in order to return sentences without them. However we also have to remove periods, commas, question marks, punctuations and I cant figure out how to do that because if it's attached to the word, it won't remove. here is my code. For example, if I put prep_text('how was the game?') it should print 'how was game'. No question mark or other stopwords. (btw, the stopwords is in the code I just cant figure out how to put it in the code box here lol :

my_stopwords =  ['is', 'it', 'the', 'if', '.', 'Is', 'It', 'The', 'If']

def prep_text(sentence):
    words = sentence.split(" ")
    words_filtered= [word for word in words if not word in my_stopwords]
    return (" ").join(words_filtered)
Tom Ron
  • 5,906
  • 3
  • 22
  • 38
  • There are so many NLP tutorials online, if you google how to prep data for NLP you will surely see entire articles on how to do this. – Chris Oct 10 '21 at 18:33
  • He wanted us to do it in the manual method, and it still dosen't really explain how i'd remove things such as question marks. – Abhi Khanna Oct 10 '21 at 18:46
  • 1
    Does this answer your question? [Best way to strip punctuation from a string](https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string) – Chris Oct 10 '21 at 18:48
  • I wish it did. But it dosen't really clarify much about how I would need to fix my code. – Abhi Khanna Oct 10 '21 at 18:55

1 Answers1

0

To help you out: just separate the tasks. Remove puntuation from the string prior to splitting.

my_punctuation_marks = '''!"#$%&'()*+, -./:;<=>?@[\]^_`{|}~'''
my_stopwords =  ['is', 'it', 'the', 'if']

def prep_text(sentence):
    for ele in sentence:
        if ele in my_punctuation_marks:
            sentence = sentence.replace(ele, " ")
    words = sentence.split(" ")
    words_filtered= [word for word in words if not word.lower() in my_stopwords if word]
    return (" ").join(words_filtered)
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26