How to put match.group() text into a pandas dataframe?

Question

I am using re to filter down a bunch of text to information I need. I am now able to print the two pieces of information I need from each line in the text with match.group().

match.group(1) is a number and match.group(4) is a string. For each line (iteration through the for loop) I need match.group(1) to be added to one column in a dataframe and match.group(4) to be added to another column.

Here is the code (the print statement at the bottom needs to be replaced with the code to add each element to the dataframe):

finalText = re.search(r'19\s+domestic and stock rights(.*?)20\s+native title rights', rawText, flags=re.S | re.I).group(

pattern = re.compile('(\d+)( ML/year )(in the |the )([\w \/\(\)]+)')

df = pd.DataFrame()

for line in finalText.splitlines():
    matches = re.finditer(pattern, line)

    for matchNum, match in enumerate(matches, start=1):
    print (match.group(1) +","+ match.group(4))

and mathc match.group(1) is a number and match.group(4) is a location so an example of the dataframe would be:

Water Usage    Town
55             York
718            Holst
7              Poke

Can you add some sample data DataFrame? Or what is `rawText` ? — jezrael, Sep 10 '20 at 06:05

gtomer · Answer 1 · 2020-09-10T06:14:26.837

1

If you want to add to a new DF then:

You first initiate a new DF outside the loop:

new_df = pd.dataframe(columns=['match1','match4'])

and inside the loop:

row = [match.group(1), match.group(4)]
new_df.loc[len(new_df)] = row

if it is to the existing DF - replace new_df with df in the two last line codes

edited Sep 10 '20 at 06:14

answered Sep 10 '20 at 06:07

gtomer

5,643
1
10
21

I get the error "cannot set a frame with no defined columns" – Dhar_ Sep 10 '20 at 06:11
I think Jezrael above solution answers your question – gtomer Sep 10 '20 at 06:13
@Dhar_ - anyhow, I have fixed my answer as well – gtomer Sep 10 '20 at 06:14
@gtomer - One idea - updating empty DataFrame is not recommneded, check [this](https://stackoverflow.com/a/24871316/2901002) – jezrael Sep 10 '20 at 06:18

score 0 · Accepted Answer · answered Sep 10 '20 at 06:11

Create list of tuples and pass to DataFrame constructor:

out = []
for line in finalText.splitlines():
    matches = re.finditer(pattern, line)

    for matchNum, match in enumerate(matches, start=1):
        out.append((match.group(1), match.group(4)))
        
df = pd.DataFrame(out, columns=['Water Usage','Town'])
print(df)

How to put match.group() text into a pandas dataframe?

2 Answers2