Split a panda column into text and numbers

Question

I am trying to split a column based on type. I want to show numbers separate from text.

I have tried to add it without a loop, but the shape is different. I therefore resorted to loop it through. It is however only giving me the last number in all fields

Python input:

newdf = pd.DataFrame()
newdf['name'] = ('leon','eurika','monica','wian')
newdf['surname'] = ('swart38','39swart','11swart','swart10')
a = newdf.shape[0]

newdf['age'] = ""
for i in range (0,a):
    newdf['age'] =  re.sub(r'\D', "",str(newdf.iloc[i,1]))

print (newdf)

I am expecting the age column to show 38,39,11,10. The answer is however all "10" being the last field.

Out:

     name  surname age
0    leon  swart38  10
1  eurika  swart39  10
2  monica  11swart  10
3    wian  swart10  10

your code would work (although it would not be very performant because of the for loop) had you replaced `newdf['age']` with `newdf.loc[i, 'age']` — godfryd, Sep 05 '19 at 08:36

score 1 · Accepted Answer · answered Sep 05 '19 at 08:34

It is because you are assigning new values to newdf['age'] in every iteration of the for loop, in which the last assignment was 10.

You can fix it by indexing:

a = newdf.shape[0]
newdf['age'] = ""
for i in range (0,a):
    newdf['age'][i] =  re.sub(r'\D', "",str(newdf.iloc[i,1]))
    #           ^^^

Or instead, use pandas.Series.str.extract:

newdf['age'] = newdf['surname'].str.extract('(\d+)')
print(newdf)

Output:

     name  surname age
0    leon  swart38  38
1  eurika  39swart  39
2  monica  11swart  11
3    wian  swart10  10

score 0 · Answer 2 · answered Sep 05 '19 at 08:35

0

Try using Series.str.replace:

newdf['age'] = newdf['surname'].str.replace(r'\D+', '')

answered Sep 05 '19 at 08:35

Tim Biegeleisen

502,043
27
286
360

Split a panda column into text and numbers

2 Answers2