I am trying to parse text from document using regex. Document contains different structure i.e. section 1.2, section (1). Below regex is able to parse text with decimal point but fails for ().
Any suggestion to handle content which starts with ().
For example:
import re
RAW_Data = '(4) The Governor-General may arrange\n with the Chief Minister of the Australian Capital Territory for the variation or revocation of an \n\narrangement in force under subsection (3). \nNorthern Territory \n (5) The Governor-General may make arrangements with the \nAdministrator of the Northern \nTerritory with respect to the'
f = re.findall(r'(^\d+\.[\d\.]*)(.*?)(?=^\d+\.[\d\.]*)', RAW_Data,re.DOTALL|re.M|re.S)
for z in f:
z=(''.join(z).strip().replace('\n',''))
print(z)
Expected output:
(4) The Governor-General may arrange with the Chief Minister of the Australian Capital Territory for the variation or revocation of an arrangement in force under subsection
(3) Northern Territory
(5) The Governor-General may make arrangements with the Administrator of the Northern Territory with respect to the'