I've looked at this thread: Regex to find all sentences of text? but can't seem to get it to solve my exact scenario. Here's the text I'm working with:
import regex as re
sentence=re.compile("[A-Z].*?[\.!?] ", re.MULTILINE | re.DOTALL )
phrase = """For necessary expenses of the Office of Inspector
General, including employment pursuant to the Inspector
General Act of 1978 (Public Law 95–452; 5 U.S.C. App.),
$99,912,000, including such sums as may be necessary for
contracting and other arrangements with public agencies
and private persons pursuant to section 6(a)(9) of the Inspector General Act of 1978 (Public Law 95–452; 5
U.S.C. App.), and including not to exceed $125,000 for
certain confidential operational expenses, including the
payment of informants, to be expended under the direction
of the Inspector General pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. App.) and
section 1337 of the Agriculture and Food Act of 1981. For necessary expenses of the Office of the General
23 Counsel, $45,390,000."""
phrase = phrase.replace("\n", "")
sentence.findall(phrase)
# outputs:
['For necessary expenses of the Office of Inspector General, including employment pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. ',
'App.), $99,912,000, including such sums as may be necessary for contracting and other arrangements with public agencies and private persons pursuant to section 6(a)(9) of the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. ',
'App.), and including not to exceed $125,000 for certain confidential operational expenses, including the payment of informants, to be expended under the direction of the Inspector General pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. ',
'App.) and section 1337 of the Agriculture and Food Act of 1981. ']
In this case, there are only 2 actual sentences in this long phrase. The first is:
For necessary expenses of the Office of Inspector General, including employment pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. App.), $99,912,000, including such sums as may be necessary for contracting and other arrangements with public agencies and private persons pursuant to section 6(a)(9) of the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. App.), and including not to exceed $125,000 for certain confidential operational expenses, including the payment of informants, to be expended under the direction of the Inspector General pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. App.) and section 1337 of the Agriculture and Food Act of 1981.
And the second is:
For necessary expenses of the Office of the General 23 Counsel, $45,390,000.
Is there a way, through regex or other means, to extract what I want? The end-goal is to be able to extract all of the full sentences, and then search them for certain things. (If that makes a difference on the solution)