0

I have a dataframe with a column called "Description." I want to scan through all the text in this column, and identify those rows that have a description that contains a number that is at least 3 digits long.

Here's where I'm at:

import re 
df['StrDesc'] = df['Description'].str.split()
y=re.findall('[0-9]{3}',str(df['StrDesc'])
print(y)

I took my text column and converted it to a string. Do I need to then run a for loop to iterate through each row before using the final regex?

Am I going about this the best way?

My error is "unexpected EOF while parsing."

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
Amanda
  • 153
  • 7

1 Answers1

0

Use str.findall, split is not necessary :

y = df['Description'].str.findall('[0-9]{3}')

But with some testing general solution is a bit complicated:

df = pd.DataFrame({'Description':['354 64 133 5867 4 te345',
                                  'rt34 3tyr 456',
                                  '23 gh346h rt 9404']})

print(df)
               Description
0  354 64 133 5867 4 te345
1            rt34 3tyr 456
2        23 gh346h rt 9404

y = df['Description'].str.findall('(?:(?<!\d)\d{3}(?!\d))')
print (y)
0    [354, 133, 345]
1              [456]
2              [346]
Name: Description, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252