Using a keyword to print a sentence in Python

Question

Hello I am writing a Python program that reads through a given .txt file and looks for keywords. In this program once I have found my keyword (for example 'data') I would like to print out the entire sentence the word is associated with.

I have read in my input file and used the split() method to rid of spaces, tabs and newlines and put all the words into an array.

Here is the code I have thus far.

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

for token in lines:
    if token == keyword:
         //I have found my keyword, what methods can I use to
        //print out the words before and after the keyword 
       //I have a feeling I want to use '.' as a marker for sentences
           print(sentence) //prints the entire sentence

file.txt Reads as follows

Welcome to SOF! This website securely stores data for the user.

desired output:

This website securely stores data for the user.

you can if you store/loop with the index using `enumerate` and get previous & next index. But the bigger problem is to separate the _sentences_ first — Jean-François Fabre, Apr 06 '19 at 21:02
If the token occurs twice in a sentence, should you print it more than once? — Will Lacey, Apr 06 '19 at 21:11
@MelvinYellow Yes the word is guaranteed to be found in the text file — Jackson Contreras, Apr 06 '19 at 21:15
@Jean-FrançoisFabre Thanks for the enumerate method! that make iterating easier, as for separating sentences I will use a period ('.') as a marker. I just need to figure out how to detect the period in the array, since it is attached to a word. — Jackson Contreras, Apr 06 '19 at 21:18
just use `word.endswith(".")` for instance. Or regex to detect punctuation — Jean-François Fabre, Apr 06 '19 at 21:24
You don't need to read the entire file into memory at once. Just iterate over the file itself: `for line in text_file: tokens = line.strip().split(); ...`. — chepner, Apr 06 '19 at 21:31

BrainDead · Accepted Answer · 2019-04-07T12:48:07.837

We can just split text on characters that represent line endings and then loop trough those lines and print those who contain our keyword.

To split text on multiple characters , for example line ending can be marked with ! ? . we can use regex:

import re

keyword = "data"
line_end_chars = "!", "?", "."
example = "Welcome to SOF! This website securely stores data for the user?"
regexPattern = '|'.join(map(re.escape, line_end_chars))
line_list = re.split(regexPattern, example)

# line_list looks like this:
# ['Welcome to SOF', ' This website securely stores data for the user', '']

# Now we just need to see which lines have our keyword
for line in line_list:
    if keyword in line:
        print(line)

But keep in mind that: if keyword in line: matches a sequence of characters, not necessarily a whole word - for example, 'data' in 'datamine' is True. If you only want to match whole words, you ought to use regular expressions: source explanation with example

Source for regex delimiters

careful as `if keyword in line` also works for substrings, not whole words. — Jean-François Fabre, Apr 07 '19 at 07:11

score 2 · Answer 2 · answered Apr 06 '19 at 23:05

My approach is similar to Alberto Poljak but a little more explicit.

The motivation is to realise that splitting on words is unnecessary - Python's in operator will happily find a word in a sentence. What is necessary is the splitting of sentences. Unfortunately, sentences can end with ., ? or ! and Python's split function does not allow multiple separators. So we have to get a little complicated and use re.

re requires us to put a | between each delimiter and escape some of them, because both . and ? have special meanings by default. Alberto's solution used re itself to do all this, which is definitely the way to go. But if you're new to re, my hard-coded version might be clearer.

The other addition I made was to put each sentence's trailing delimiter back on the sentence it belongs to. To do this I wrapped the delimiters in (), which captures them in the output. I then used zip to put them back on the sentence they came from. The 0::2 and 1::2 slices will take every even index (the sentences) and concatenate them with every odd index (the delimiters). Uncomment the print statement to see what's happening.

import re

lines = "Welcome to SOF! This website securely stores data for the user. Another sentence."
keyword = "data"

sentences = re.split('(\.|!|\?)', lines)

sentences_terminated = [a + b for a,b in zip(sentences[0::2], sentences[1::2])]

# print(sentences_terminated)

for sentence in sentences_terminated:
    if keyword in sentence:
        print(sentence)
        break

Output:

 This website securely stores data for the user.

Upvoted because you explained parts of my answer better than me while I just pasted the source. — BrainDead, Apr 06 '19 at 23:29

score 1 · Answer 3 · answered Apr 06 '19 at 22:00

This solution uses a fairly simple regex in order to find your keyword in a sentence, with words that may or may not be before and after it, and a final period character. It works well with spaces and it's only one execution of re.search().

import re

text_file = open("file.txt", "r")
text = text_file.read()

keyword = 'data'

match = re.search("\s?(\w+\s)*" + keyword + "\s?(\w+\s?)*.", text)
print(match.group().strip())

score 0 · Answer 4 · answered Apr 06 '19 at 21:33

Another Solution:

def check_for_stop_punctuation(token):
    stop_punctuation = ['.', '?', '!']
    for i in range(len(stop_punctuation)):
        if token.find(stop_punctuation[i]) > -1:
            return True
    return False

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

sentence = []
stop_punctuation = ['.', '?', '!']

i = 0
while i < len(lines):
    token = lines[i]
    sentence.append(token)
    if token == keyword:
        found_stop_punctuation = check_for_stop_punctuation(token)
        while not found_stop_punctuation:
            i += 1
            token = lines[i]
            sentence.append(token)
            found_stop_punctuation = check_for_stop_punctuation(token)
        print(sentence)
        sentence = []
    elif check_for_stop_punctuation(token):
        sentence = []
    i += 1

Using a keyword to print a sentence in Python

4 Answers4