1

I have data streaming in the following format:

from StringIO import StringIO
data ="""\
ANI/IP
sip:5554447777@10.94.2.15
sip:10.66.7.34@6665554444
sip:3337775555@10.94.2.11
"""
import pandas as pd
df = pd.read_table(StringIO(data),sep='\s+',dtype='str') 

What I would like to do is replace the column content with just the phone number part of the string above. I tried the suggestions from this thread like so:

df['ANI/IP'] = df['ANI/IP'].str.replace(r'\d{10}', '').astype('str')
print(df)

However, this results in:

.....print(df)
            ANI/IP
0  sip:@10.94.2.15
1  sip:@10.66.7.34
2  sip:@10.94.2.11

I need the phone numbers, so how do I achieve this? :

  ANI/IP
0 5554447777
1 6665554444
2 3337775555
Community
  • 1
  • 1
Jason
  • 2,834
  • 6
  • 31
  • 35

2 Answers2

4

The regex \d{10} searches for substring of digits precisely 10 characters long.

df['ANI/IP'] = df['ANI/IP'].str.replace(r'\d{10}', '').astype('str')

This removes the numbers!
Note: You shouldn't do astype str (it's not needed and there is no str dtype in pandas).

You want to extract these phone numbers:

In [11]: df["ANI/IP"].str.extract(r'(\d{10})')  # before overwriting!
Out[11]:
0    5554447777
1    6665554444
2    3337775555
Name: ANI/IP, dtype: object

Set this as another column and you're away:

In [12]: df["phone_number"] = df["ANI/IP"].str.extract(r'(\d{10})')
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
1

You could use pandas.core.strings.StringMethods.extract to extract

In [10]: df['ANI/IP'].str.extract("(\d{10})")
Out[10]:
0    5554447777
1    6665554444
2    3337775555
Name: ANI/IP, dtype: object
Zero
  • 74,117
  • 18
  • 147
  • 154