0

I have a similar problem that was discussed in this link Python extract sentence containing word, but i do not want a numeric string to end the sentence.

eg:

The apt subtitle for the binoculars will be 9015.18.1190, CTS, which provides for binoculars. The rate of duty on this will be free.

When I tried this:

import re
txt="The apt subtitle for the binoculars will be 9015.18.1190, CTS, which provides for binoculars. The rate of duty on this will be free."
define_words = 'apt subtitle'
print (re.findall(r"([^.]*?%s[^.]*\.)" % define_words,txt))

Actual output:

The apt subtitle for the binoculars will be 9015.

However the expected output is:

The apt subtitle for the binoculars will be 9015.18.1190, CTS, which provides for binoculars.

Can someone help me on achieving the expected output?

Community
  • 1
  • 1
AB6
  • 113
  • 6
  • If there is just one sentence (`txt`) you need to handle, you can simply use `txt.split('The rate of duty on this will be free.')[0]`. However this will not provide you systematic solution in case there are many sentences to handle – Andersson Dec 12 '16 at 09:19
  • You can do a split on the sentences and then find whether the desired word exists in the sentence and then just print that sentence. – Rohan Amrute Dec 12 '16 at 09:43

1 Answers1

1

Using lookahead regex to assert matching ending with a . thats not follow by numeric

This works for your example input, however might need to tweak a bit to be more generic to handle more cases.

import re
txt="The apt subtitle for the binoculars will be 9015.18.1190, CTS, which provides for binoculars. The rate of duty on this will be free."
define_words = 'apt subtitle'
print (re.findall(r"([^.]*?%s.*?\.)(?!\d)" % define_words,txt))
# The apt subtitle for the binoculars will be 9015.18.1190, CTS, which provides for binoculars.
Skycc
  • 3,496
  • 1
  • 12
  • 18
  • Hey you are right.. I just learnt that the problem is not because of the numbers.. There are braces and double quotes in the sentence.. Im able to remove braces using `txt=str(txt).strip('()')` but im not able to remove the double quotes. However these double quotes are not present in all the sentences. How can i handle them? – AB6 Dec 16 '16 at 07:19
  • where is the double quote, can you show example and expected output ? you can use `txt = txt.strip('()"')` to strip off round bracket and double quote – Skycc Dec 16 '16 at 07:21
  • Hi @Skycc, liked the solution. I have a similar kinda problem. Would you please check `https://stackoverflow.com/questions/71430999/extract-a-sentence-based-on-specific-phrase` – Roy Mar 11 '22 at 19:56