Cutting of a sentence after trigger words

Question

I have the following class with methods:

class Trigger():

    def getRidOfTrashPerSentence(self, line, stopwords):
        countWord = 0
        words = line.split()
        for word in words:
            if countWord == 0:
                if word in stopwords:
                    sep = word
                    lineNew = line.split(sep, 1)[0]
                    countWord = countWord + 1
                    return(lineNew)

    stopwords = ['regards', 'Regards']

    def getRidOfTrash(self, aTranscript):
        result = [self.getRidOfTrashPerSentence(line, self.stopwords) for  line in aTranscript]
        return(result)

What I would like to achieve with it is to cut of 'trash' in sentence after certain trigger words like ['regards', 'Regards']

So when I would insert a block like this:

aTranScript = [ "That's fine, regards Henk", "Allright great"]

I am looking for an output like this:

aTranScript = [ "That's fine, regards", "Allright great"]

However when I do this:

newFile = Trigger()
newContent = newFile.getRidOfTrash(aTranScript)

I only get "That's fine".

Any thoughts on how I can get both the strings

How about you append the separator back after split? Here is a similar question - http://stackoverflow.com/questions/7866128/python-split-without-removing-the-delimiter — Vinay, Feb 14 '17 at 08:53
I dont understand what you Vinay, could you elaborate on this? — Henk Straten, Feb 14 '17 at 08:58
You can do this - `lineNew = line.split(sep, 1)[0]` `lineNew + = sep` — Vinay, Feb 14 '17 at 09:01

Ika8 · Answer 1 · 2017-02-14T11:21:21.567

2

this is an easy solution:

yourString = 'Hello thats fine, regards Henk'
yourString.split(', regards')[0]

This code will return: 'Hello thats fine'

If you want, you can concate 'regards' at the end:

yourString.split(', regards')[0]+', regards'

edited Feb 14 '17 at 11:21

answered Feb 14 '17 at 08:52

Ika8

391
1
12

@EricDuminil you are rigth, change ', regards' for ' Henk' ;) – Ika8 Feb 14 '17 at 09:12
How would you adapt it to multiple trigger words? – Eric Duminil Feb 14 '17 at 14:09
You have a list of stopwords, you iterate over the list, when len(listAfterSplit)>1, you know the word you are spliting.. – Ika8 Feb 14 '17 at 15:00

score 1 · Answer 2 · answered Feb 14 '17 at 09:26

Regex makes it easier to replace. As a bonus, it is case-insensitive so you don't have to write 'regards' and 'Regards' in your list :

import re

stop_words = ['regards', 'cheers']

def remove_text_after_stopwords(text, stop_words):
    pattern = "(%s).*$" % '|'.join(stop_words)
    remove_trash = re.compile(pattern, re.IGNORECASE)
    return re.sub(remove_trash, '\g<1>', text)

print remove_text_after_stopwords("That's fine, regards, Henk", stop_words)
# That's fine, regards
print remove_text_after_stopwords("Good, cheers! Paul", stop_words)
# Good, cheers
print remove_text_after_stopwords("No stop word here", stop_words)
# No stop word here

If you have a list of strings, you can just use a list comprehension to apply this method over every string.

score 0 · Answer 3 · answered Feb 14 '17 at 09:07

You can scan the words from the line and remove them if the previous word is a stopword:

class Trigger():

    stopwords = ['regards', 'Regards']

    def getRidOfTrashPerSentence(self, line):
        words = line.split()
        new_words = [words[0]]
        for i in range(1, len(words)):
            if not words[i-1] in self.stopwords:
                new_words.append(words[i])
        return " ".join(new_words)  # reconstruct line

    def getRidOfTrash(self, aTranscript):
        result = [self.getRidOfTrashPerSentence(line) for line in aTranscript]
        return(result)

aTranScript = [ "That's fine, regards Henk", "Allright great"]
newFile = Trigger()
newContent = newFile.getRidOfTrash(aTranScript)
print(newContent)

Cutting of a sentence after trigger words

3 Answers3