I have a txt file that I converted from a pdf that contains a long list of items. These items have a numbering convention as follows:
[A-Z]{1,2}\d{1,2}\.\d{1,2}\.\d{1,2}
This expression would match something between:
A1.1.1
and
ZZ99.99.99
This works just fine. The issue I am having is that I am trying to capture this in group 1 and everything between each item number (the item description) in group 2.
I also need these returned as a list or an iterable so that, eventually, the contents captured can be exported to an excel spreadsheet.
This is the regex I have currently:
^([A-Z]{1,2}\d{1,2}\.\d{1,2}\.\d{1,2}\s)([\w\W]*?)(?:\n)
Follow this link to find a sample of what I have and the issues I am facing:
Is anyone able to help me figure out how to capture everything between each number no matter how many paragraphs?
Any input would be greatly appreciated, thanks!