50

Given the following data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
df

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

I'd like to extract the numbers from each cell (where they exist). The desired result is:

    A
0   1
1   NaN
2   10
3   100
4   0

I know it can be done with str.extract, but I'm not sure how.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
Dance Party
  • 3,459
  • 10
  • 42
  • 67

4 Answers4

98

Give it a regex capture group:

df.A.str.extract('(\d+)')

Gives you:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object

(\d+) is a regex capturing group, and \d+ specifies a regex pattern that matches only digits. Note that this will only work for whole numbers and not floats.

cs95
  • 379,657
  • 97
  • 704
  • 746
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
8

To answer @Steven G 's question in the comment above, this should work:

df.A.str.extract('(^\d*)')
Taming
  • 117
  • 1
  • 5
8

U can replace your column with your result using "assign" function:

df = df.assign(A = lambda x: x['A'].str.extract('(\d+)'))
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28
2

If you have cases where you have multiple disjoint sets of digits, as in 1a2b3c, in which you would like to extract 123, you can do it with Series.str.replace:

>>> df
        A
0      1a
1      b2
2    a1b2
3  1a2b3c
>>> df['A'] = df['A'].str.replace('\D+', '')
0      1
1      2
2     12
3    123

You could also work this around with Series.str.extractall and groupby but I think that this one is easier.

Hope this helps!

Rostan
  • 809
  • 9
  • 25