Extracting only numbers from series Python

Question

I have a series that looks like:

ID
WTG-1
11
11-1
12B1
13-1
5
6
G7
.
.

I simply want to be able to extract all the numbers from each ID.

When I use my code:

df['ID'] = df['ID'].str.extract('(\d+)', expand=True)

It does extract all from the front of the line but skips a number if there is a string/letter/character breaker - ie for 11-1 it only gathers 11 without the extra 1.

I'd like the output to be:

ID         ID #
WTG-1      1
11         11
11-1       111
12B1       121
13-1       131
5          5
6          6
G7         7
.
.

Is there a way to count around the characters in between?

I will suggest adding the pandas tag – BENY Jan 14 '19 at 16:18 — BENY, Jan 14 '19 at 16:18

score 4 · Accepted Answer · answered Jan 14 '19 at 16:18

4

Using findall

df.ID.str.findall('(\d+)').apply(''.join)
Out[92]: 
0      1
1     11
2    111
3    121
4    131
5      5
6      6
7      7
Name: ID, dtype: object

answered Jan 14 '19 at 16:18

BENY

317,841
20
164
234

I suspect this runs faster than my answer – Charles Landau Jan 14 '19 at 16:19
@CharlesLandau before test it out , it is hard to say :-) – BENY Jan 14 '19 at 16:20
Another question but related - would the way to do this for letters be `.str.findall('([a-zA-Z ]+)').apply(''.join)` ? – HelloToEarth Jan 14 '19 at 16:31
@HelloToEarth yes – BENY Jan 14 '19 at 16:33

score 1 · Answer 2 · answered Jan 14 '19 at 16:18

1

astype and apply can do it.

df["ID"] = df["ID"].apply(lambda x: "".join(c for c in x if c.isdigit()).astype(int)

answered Jan 14 '19 at 16:18

Charles Landau

4,187
1
8
24

score 1 · Answer 3 · answered Jan 14 '19 at 16:25

1

Or replace,

df['New_ID'] = df.ID.str.replace('\D+', '')

    ID      New_ID
0   WTG-1   1
1   11      11
2   11-1    111
3   12B1    121
4   13-1    131
5   5       5
6   6       6
7   G7      7

answered Jan 14 '19 at 16:25

Vaishali

37,545
5
58
86

Extracting only numbers from series Python

3 Answers3