3

I have the following text in column A:

A   
hellothere_3.43  
hellothere_3.9

I would like to extract only the numbers to another new column B (next to A), e.g:

B                      
3.43   
3.9

I use: str.extract('(\d.\d\d)', expand=True) but this copies only the 3.43 (i.e. the exact number of digits). Is there a way to make it more generic?

Many thanks!

MGs
  • 290
  • 1
  • 3
  • 16

2 Answers2

10

Use Regex.

Ex:

import pandas as pd

df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]})
df["B"] = df["A"].str.extract("(\d*\.?\d+)", expand=True)
print(df)

Output:

                 A     B
0  hellothere_3.43  3.43
1   hellothere_3.9   3.9
Rakesh
  • 81,458
  • 17
  • 76
  • 113
0

I think string split and apply lambda is quite clean.

import pandas as pd

df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]})
df["B"] = df['A'].str.split('_').apply(lambda x: float(x[1]))

I haven't done any proper comparison, but it seems faster than the regex-solution on small tests.

RickardSjogren
  • 4,070
  • 3
  • 17
  • 26