2

I have data that looks like this

0            504189219
1            500618053
2            0537533477
3            966581566618
4            00536079946

I want the output to be something like this

504189219
500618053
537533477
581566618
536079946
Corralien
  • 109,409
  • 8
  • 28
  • 52
A.H
  • 21
  • 1
  • Looks like what you really want is the last 9 characters but I have a question. What do you want to happen if the first character of the last 9 characters is not '5' or, indeed, if there's no '5' anywhere in the string? – DarkKnight Feb 16 '22 at 11:36

5 Answers5

3

Use str.extract:

df['Col'] = df['Col'].str.extract('(5\d{8})')
print(df)

# Output
         Col
0  504189219
1  500618053
2  537533477
3  581566618
4  536079946

Setup:

df = pd.DataFrame({'Col': ['504189219', '500618053', '0537533477',
                           '966581566618', '00536079946']})
print(df)

# Output
            Col
0     504189219
1     500618053
2    0537533477
3  966581566618
4   00536079946
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

There is a library called phonenumbers to help you do that job, see this post

KingOtto
  • 840
  • 5
  • 18
0

Using the same setup as Corralien, this method is also possible :

df = pd.DataFrame({'Col': ['504189219', '500618053', '0537533477',
                           '966581566618', '00536079946']})

def getNumber(n):
    return n[n.find('5'):n.find('5') + 9]

df['Col'] = df['Col'].apply(getNumber)

print(df)

Same result can be achieved with a lambda expression as well.

Other answers originally did not take into account the constraint of the 9 numbers.

Titouan L
  • 1,182
  • 1
  • 8
  • 24
  • The behaviour of this is perhaps not what you'd want if/when the character '5' is absent – DarkKnight Feb 16 '22 at 11:56
  • The author did not clarify the required behaviour, but as this is defined in a function, it's easy to catch cases where `find()` returns `-1` before returning a value. – Titouan L Feb 16 '22 at 11:59
0

This may be a more robust approach:

import pandas as pd

def fix(col):
    return col[-9:] if len(col) > 8 and col[-9] == '5' else col


df = pd.DataFrame({'Col': ['0404189219', '500618053', '0537533477',
                           '966581566618', '00536079946']})

df['Col'] = df['Col'].apply(fix)
print(df)

Output:

         Col
0  0404189219
1   500618053
2   537533477
3   581566618
4   536079946

Note how in the absence of '5', the original value remains intact

DarkKnight
  • 19,739
  • 3
  • 6
  • 22
-1

for r in range(len(df.Col)): df.Col[r][df.Col[r].find("5"):]

  • What is this supposed to do? – DarkKnight Feb 16 '22 at 12:00
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 16 '22 at 12:54