0

I have read an entire column from a excel sheet into a dataframe. Each cell in that column has a bunch of words with numbers (like phone numbers). How do I loop the data frame and extract numbers using a specific pattern using regex.

I have tried the following code

for i in (df): 
   df.str.contains('(4[0-9]{12}([0-9]{3})|[25][1-7][0-9]{14}|6(011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}')

I know my regex is wrong, but I am getting the following error.

Edit : I have updated my regex. The cells have the data like this

" Hello, I am trying to order something ... my card number is 45621.... ." I want to take out the card number and put it in a file.

Traceback (most recent call last):
  File "c:/Program Files/Python37/Scripts/output.py", line 12, in <module>
    df.str.contains('^f')
  File "C:\Program Files\Python37\lib\site-packages\pandas\core\generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'str'
Siddharth C
  • 39
  • 1
  • 9

2 Answers2

0

Right now you are calling df.str. This means you are trying to access the str object of dataframe which doesn't make sense to python hence the error. Not sure what you were trying to do there. In your loop i will be the column. From there you can loop through the rows and then apply the regex. This is documented throughout stack overflow but is probably not the approach you want to take.

Instead you should make a function that takes in a cell as a string and output the post-regex string. Then you can use the apply() within pandas to apply that function to each cell all at once. If you google "apply() pandas regex" a bunch of different examples will show you how to do this. One such example is this one.

If you provide a bit more detail of the regex you are trying to accomplish we can help you to make the above structure in more detail.

noah
  • 2,616
  • 13
  • 27
0
  1. first import regular expression
import re
  1. It's better to create a new column
df['new_1'] = re.search('4[0-9]{12}([0-9]{3})|[25][1-7][0-9]{14}|6(011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}', df['<num_col_name>'])
  1. Now check the new_1 column
df['new_1]

You haven't posted the column which you wanna go through, so I had to use your entered string.

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52