Pandas Extract Number from String

Question

Given the following data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
df

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

I'd like to extract the numbers from each cell (where they exist). The desired result is:

I know it can be done with str.extract, but I'm not sure how.

score 98 · Accepted Answer · edited Apr 28 '23 at 04:49

98

Give it a regex capture group:

df.A.str.extract('(\d+)')

Gives you:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object

(\d+) is a regex capturing group, and \d+ specifies a regex pattern that matches only digits. Note that this will only work for whole numbers and not floats.

edited Apr 28 '23 at 04:49

cs95

379,657
97
704
746

answered Jun 07 '16 at 15:39

Jon Clements

138,671
33
247
280

1

how could I do it when there is a comma like : `6,000 a` – Steven G Jul 01 '17 at 15:02
1

@StevenG strip out commas first? – Jon Clements Jul 01 '17 at 17:13
1

As of 2020, this codes gives a FutureWarning. You get around it by adding the parameter `expand=False` to the `extract` – lebelinoz Apr 21 '20 at 02:44
1

This doesn't work if there is number after alphabets – Upasana Mittal Apr 24 '20 at 15:02
This does not work for my column with number and units: `0.7 mg ` – mLstudent33 Oct 16 '20 at 00:03
Great answer, but would have been even greater if it was explained what ```'(\d+)'``` does. In the regular expression ```\d``` stands for "any digit" and ```+``` stands for "one or more". So all digits are extracted, whereas with ```str.replace('\d+', '')``` all digits are removed. – Rivered Dec 06 '22 at 22:48

score 8 · Answer 2 · answered Jul 07 '17 at 00:32

8

To answer @Steven G 's question in the comment above, this should work:

df.A.str.extract('(^\d*)')

answered Jul 07 '17 at 00:32

Taming

117
1
5

score 8 · Answer 3 · answered Oct 30 '20 at 00:06

8

U can replace your column with your result using "assign" function:

df = df.assign(A = lambda x: x['A'].str.extract('(\d+)'))

answered Oct 30 '20 at 00:06

Mehdi Golzadeh

2,594
1
16
28

score 2 · Answer 4 · answered Sep 28 '22 at 08:15

If you have cases where you have multiple disjoint sets of digits, as in 1a2b3c, in which you would like to extract 123, you can do it with Series.str.replace:

>>> df
        A
0      1a
1      b2
2    a1b2
3  1a2b3c
>>> df['A'] = df['A'].str.replace('\D+', '')
0      1
1      2
2     12
3    123

You could also work this around with Series.str.extractall and groupby but I think that this one is easier.

Hope this helps!

Pandas Extract Number from String

4 Answers4

Linked

Related