I am using re to filter down a bunch of text to information I need. I am now able to print the two pieces of information I need from each line in the text with match.group().
match.group(1) is a number and match.group(4) is a string. For each line (iteration through the for loop) I need match.group(1) to be added to one column in a dataframe and match.group(4) to be added to another column.
Here is the code (the print statement at the bottom needs to be replaced with the code to add each element to the dataframe):
finalText = re.search(r'19\s+domestic and stock rights(.*?)20\s+native title rights', rawText, flags=re.S | re.I).group(
pattern = re.compile('(\d+)( ML/year )(in the |the )([\w \/\(\)]+)')
df = pd.DataFrame()
for line in finalText.splitlines():
matches = re.finditer(pattern, line)
for matchNum, match in enumerate(matches, start=1):
print (match.group(1) +","+ match.group(4))
and mathc match.group(1) is a number and match.group(4) is a location so an example of the dataframe would be:
Water Usage Town
55 York
718 Holst
7 Poke