0

How to find sentences in text file that starts and ends with particular words for example sentences that starts with 'The' and ends with 'u'.I have tried

def main():
  f=open('D:\\file.txt')
  print 'The lines that starts with The and Ends with u'
  for line in f:
     for j in line.split('.'):
         if j[0]=='T' and j[1]=='h' and j[2]=='e' and j[3]==' ' and  j[-1]=='u':
            print j
if __name__ == '__main__':main()

Instead of character comparison can we do this using word comparison?

Emmanu
  • 749
  • 3
  • 10
  • 26

3 Answers3

0

You could use startswith and endswith to do the comparison with strings:

print 'The lines that starts with The and Ends with u'

with open('test.txt') as f:
    for line in f:
        line = line.strip()
        if line.startswith('The') and line.endswith('u'):
            print line
niemmi
  • 17,113
  • 7
  • 35
  • 42
  • Sentences normally end with some kind of punctuation, not a letter :) – jim Apr 17 '16 at 08:21
  • 1
    True but given that there's no example input and the given code prints message about lines one has to make some assumptions. Anyway the main point, using `startswith` and `endswith` will solve the problem as long as sentences/lines are properly read & constructed. – niemmi Apr 17 '16 at 08:26
0

There are a few easy ways to read "sentences" and a couple more sophisticated ones (but they are a lot better — e.g. NLTK). Pick your favourite from this question: Python split text on sentences

After you are done with extracting "sentences" instead of lines, you can pretty much use your own code for comparison.

Community
  • 1
  • 1
jim
  • 906
  • 7
  • 14
0
import re

x = 'The lines that starts with The and Ends with u'

print(re.findall(r'\A(The)\s(\w+\s)+u\Z',x))
zondo
  • 19,901
  • 8
  • 44
  • 83
getflag
  • 53
  • 1
  • 2
  • 10
  • It is good to explain what this does. Answers like yours are automatically put among Low Quality Posts and people have to decide if they are to be deleted. – Vladimir F Героям слава Apr 17 '16 at 11:20
  • Ok! I didn't know that. Thank you for letting me know. Explanation: I have tried finding out the requirements using regex. So, I have imported the re module to use regex and then the expression, \A- represents check for the presence at the start of the string and \s is the space that is followed after the first word.(\w+\s) implies there can be a series of literals followed by space,and + is used to say match atleast 1 or more and finally u\Z,where\Z implies look for 'u' at the end of the string. – getflag Apr 17 '16 at 16:18