1

I am using re to filter down a bunch of text to information I need. I am now able to print the two pieces of information I need from each line in the text with match.group().

match.group(1) is a number and match.group(4) is a string. For each line (iteration through the for loop) I need match.group(1) to be added to one column in a dataframe and match.group(4) to be added to another column.

Here is the code (the print statement at the bottom needs to be replaced with the code to add each element to the dataframe):

finalText = re.search(r'19\s+domestic and stock rights(.*?)20\s+native title rights', rawText, flags=re.S | re.I).group(

pattern = re.compile('(\d+)( ML/year )(in the |the )([\w \/\(\)]+)')

df = pd.DataFrame()

for line in finalText.splitlines():
    matches = re.finditer(pattern, line)

    for matchNum, match in enumerate(matches, start=1):
    print (match.group(1) +","+ match.group(4))

and mathc match.group(1) is a number and match.group(4) is a location so an example of the dataframe would be:

Water Usage    Town
55             York
718            Holst
7              Poke
Dhar_
  • 71
  • 6

2 Answers2

1

If you want to add to a new DF then:

You first initiate a new DF outside the loop:

new_df = pd.dataframe(columns=['match1','match4'])

and inside the loop:

row = [match.group(1), match.group(4)]
new_df.loc[len(new_df)] = row

if it is to the existing DF - replace new_df with df in the two last line codes

gtomer
  • 5,643
  • 1
  • 10
  • 21
0

Create list of tuples and pass to DataFrame constructor:

out = []
for line in finalText.splitlines():
    matches = re.finditer(pattern, line)

    for matchNum, match in enumerate(matches, start=1):
        out.append((match.group(1), match.group(4)))
        
df = pd.DataFrame(out, columns=['Water Usage','Town'])
print(df)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252