How Do I Find a Specific Expression within a Dataframe Column?

Question

I have a dataframe with a column called "Description." I want to scan through all the text in this column, and identify those rows that have a description that contains a number that is at least 3 digits long.

Here's where I'm at:

import re 
df['StrDesc'] = df['Description'].str.split()
y=re.findall('[0-9]{3}',str(df['StrDesc'])
print(y)

I took my text column and converted it to a string. Do I need to then run a for loop to iterate through each row before using the final regex?

Am I going about this the best way?

My error is "unexpected EOF while parsing."

You're missing a parentheses at the end of your 3rd line. – PMende Aug 07 '18 at 17:04 — PMende, Aug 07 '18 at 17:04

jezrael · Accepted Answer · 2018-08-07T17:20:19.730

0

Use str.findall, split is not necessary :

y = df['Description'].str.findall('[0-9]{3}')

But with some testing general solution is a bit complicated:

df = pd.DataFrame({'Description':['354 64 133 5867 4 te345',
                                  'rt34 3tyr 456',
                                  '23 gh346h rt 9404']})

print(df)
               Description
0  354 64 133 5867 4 te345
1            rt34 3tyr 456
2        23 gh346h rt 9404

y = df['Description'].str.findall('(?:(?<!\d)\d{3}(?!\d))')
print (y)
0    [354, 133, 345]
1              [456]
2              [346]
Name: Description, dtype: object

edited Aug 07 '18 at 17:20

answered Aug 07 '18 at 17:06

jezrael

822,522
95
1,334
1,252

1

Using panda's built-in str manipulations and regex is definitely the way to go. – PMende Aug 07 '18 at 17:09
@PMende - absolutely agree ;) – jezrael Aug 07 '18 at 17:09
Thank you, all! I was thinking it was more complicated than it was. I used jezrael single line of code. – Amanda Aug 07 '18 at 18:20

How Do I Find a Specific Expression within a Dataframe Column?

1 Answers1