The example text:
34A-6-87.1 Disposal of tire waste--Collection or processing sites--Penalties for violations.
34A-6-87.1. Disposal of tire waste--Collection or processing sites--Penalties for violations. Any person hauling or transporting any waste tire as defined in subdivision 34A-6-61(25), originating from a wholesaler or retailer shall ensure the proper disposal of the waste tire at a department approved waste tire collection or processing site, or that it is used in some other manner approved by the department. The board may promulgate rules, pursuant to chapter 1-26, setting forth the requirements and procedures for department approval of waste tire collection, processing sites, or other approved uses for waste tires. Any waste tire hauler or transporter who intentionally disposes of any waste tire in a manner inconsistent with the provisions of this section is subject to a civil action by the State of South Dakota in circuit court for the recovery of a civil penalty of not more than ten thousand dollars per day per violation, or for costs to clean up sites not approved, or both. The violator is also subject to the criminal penalties provided for in § 34A-6-87.
Source:
SL 1998, ch 202, § 1. Source:
34A-6-8-2A Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
34A-6-8-2A Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
34A-6-88, Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
34A-6-88 to 23-34-1A Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
34A-1-28 Repealed.
34A-1-28. Repealed by SL 1986, ch 295, § 7.
34A-1-28 Repealed.
34A-1-28. Repealed by SL 1986, ch 295, § 7.
34A-6-89-34 Scale device required--Records--Report--Contents--Permit for longer capacity disposal.
34A-6-89. Scale device required--Records--Report--Contents--Permit for longer capacity disposal. Any solid waste facility permitted to dispose of solid waste in excess of one hundred thousand tons per year shall be equipped with a scale device, approved by the Department of Public Safety, and shall weigh and maintain records of the total amount of solid waste disposed of at the facility. On or before the fifteenth of each month, the facility shall submit to the department a report upon such forms as may be prescribed by the department in rules promulgated pursuant to chapter 1-26. The report shall state the total amount of solid waste disposed of at the facility in the preceding month. The forms shall contain a sworn certification by the owner or operator that the information contained in the monthly report is true and correct based upon his own best information, knowledge, and belief. No facility may dispose of solid waste in excess of one hundred fifty thousand tons per year without a permit authorizing the capacity of the facility to dispose of solid waste in such quantities as provided in § 34A-6-1.16.
Source:
SL 1992, ch 254, § 50Q; SL 2004, ch 17, § 231. Source:
I am assuming you want to break that text into blocks separated by the statute referenced.
If so, simplify your regex. You can do:
'^(\d+\w+-\d+-\d+(?:[,.\-0-9A-Z]+)?\s+.*?(?=\n\n|\n+\Z|\Z))'
^ assert position at start of a line
1st Capturing group (\d+\w+-\d+-\d+(?:[,.\-0-9A-Z]+)?[ \t]+.*?(?=\n\n|\n+\Z|\Z))
\d+ match a digit [0-9]
\w+ match any word character [a-zA-Z0-9_]
- matches the character - literally
\d+ match a digit [0-9]
- matches the character - literally
\d+ match a digit [0-9]
(?:[,.\-0-9A-Z]+)? Non-capturing group
[ \t]+ match a single character present in the list below
.*? matches any character
(?=\n\n|\n+\Z|\Z) Positive Lookahead - Assert that the regex below can be matched
1st Alternative: \n\n
2nd Alternative: \n+\Z
3rd Alternative: \Z
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
s modifier: single line. Dot matches newline characters
Note:
- the use of the anchor
^
combined with re.S | re.M
- moving the positive lookahead of
(?=\n\n|\n+\Z|\Z)
to the end.
Example in regex101
Once you have the individual blocks, you can further parse those blocks to find what you need. As a simple example:
statutes={}
pat=re.compile(r'^(\d+\w+-\d+-\d+(?:[,.\-0-9A-Z]+)?[ \t]+.*?(?=\n\n|\n+\Z|\Z))', re.S | re.M)
for block in pat.finditer(txt):
m=re.search(r'^.*(Superseded|Repealed|Transferred|Obsolete|Reserved|Rejected|Omitted|Not|Executed)', block.group(1))
if m:
statutes.setdefault(m.group(1), []).append(block.group(1))
else:
statutes.setdefault('Enacted', []).append(block.group(1))
for status in sorted(statutes):
print '{} ============\n{}\n'.format(status, '\n\n'.join(statutes[status]))
Which separates out the example text into the status of the various statutes (Enacted, repealed, xfered, etc)
Like so:
Enacted ============
34A-6-87.1 Disposal of tire waste--Collection or processing sites--Penalties for violations.
34A-6-87.1. Disposal of tire waste--Collection or processing sites--Penalties for violations. Any person hauling or transporting any waste tire as defined in subdivision 34A-6-61(25), originating from a wholesaler or retailer shall ensure the proper disposal of the waste tire at a department approved waste tire collection or processing site, or that it is used in some other manner approved by the department. The board may promulgate rules, pursuant to chapter 1-26, setting forth the requirements and procedures for department approval of waste tire collection, processing sites, or other approved uses for waste tires. Any waste tire hauler or transporter who intentionally disposes of any waste tire in a manner inconsistent with the provisions of this section is subject to a civil action by the State of South Dakota in circuit court for the recovery of a civil penalty of not more than ten thousand dollars per day per violation, or for costs to clean up sites not approved, or both. The violator is also subject to the criminal penalties provided for in § 34A-6-87.
Source:
SL 1998, ch 202, § 1. Source:
34A-6-89-34 Scale device required--Records--Report--Contents--Permit for longer capacity disposal.
34A-6-89. Scale device required--Records--Report--Contents--Permit for longer capacity disposal. Any solid waste facility permitted to dispose of solid waste in excess of one hundred thousand tons per year shall be equipped with a scale device, approved by the Department of Public Safety, and shall weigh and maintain records of the total amount of solid waste disposed of at the facility. On or before the fifteenth of each month, the facility shall submit to the department a report upon such forms as may be prescribed by the department in rules promulgated pursuant to chapter 1-26. The report shall state the total amount of solid waste disposed of at the facility in the preceding month. The forms shall contain a sworn certification by the owner or operator that the information contained in the monthly report is true and correct based upon his own best information, knowledge, and belief. No facility may dispose of solid waste in excess of one hundred fifty thousand tons per year without a permit authorizing the capacity of the facility to dispose of solid waste in such quantities as provided in § 34A-6-1.16.
Source:
SL 1992, ch 254, § 50Q; SL 2004, ch 17, § 231. Source:
Repealed ============
34A-1-28 Repealed.
34A-1-28. Repealed by SL 1986, ch 295, § 7.
34A-1-28 Repealed.
34A-1-28. Repealed by SL 1986, ch 295, § 7.
Transferred ============
34A-6-8-2A Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
34A-6-8-2A Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
34A-6-88, Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
34A-6-88 to 23-34-1A Transferred.
34A-6-88. Transferred to § 46A-1-83.1.
As an example of how SIMPLE your regex CAN be, at least with the example text, you can just use Python's split
method with \n\n
returns to get the same result:
statutes={}
for block in txt.split('\n\n'):
m=re.search(r'^.*(Superseded|Repealed|Transferred|Obsolete|Reserved|Rejected|Omitted|Not|Executed)', block)
if m:
statutes.setdefault(m.group(1), []).append(block)
else:
statutes.setdefault('Enacted', []).append(block)
# etc