0

I have the following dataframe:

df = pd.DataFrame({'name': 'A B C D'.split(),
                   'result': ['.50', 'text 10', 'text 0.20', '<0.75']})

print(df)

  name     result
0    A        .50
1    B    text 10
2    C  text 0.20
3    D      <0.75

I need to extract the numeric values (float, integer or any digits). I'm trying the following script but not getting the desired output with too many NaN values:

df['result'] = df['result'].astype(str)
df['result'].str.extract(r'(/\d+\.\d+/)')

Desired output

  name     result
0    A       .50
1    B       10
2    C      0.20
3    D      0.75

What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks!

Roy
  • 924
  • 1
  • 6
  • 17
  • Does this answer your question? [How to extract a floating number from a string](https://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string) – Jacob Botha Jan 10 '22 at 16:21
  • Hi @JacobBotha, sorry it doesn't Thank you! – Roy Jan 10 '22 at 17:14

1 Answers1

1

You can use a different pattern:

df['result2'] = df['result'].str.extract(r'([0-9.+-]+)')
print(df)

# Output
  name     result result2
0    A        .50     .50
1    B    text 10      10
2    C  text 0.20    0.20
3    D      <0.75    0.75
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Hi @Corralien, it kinda works. Thank you. However, I'm getting about 38 `.` counts, which aren't in the original data. Also, it's not letting me convert to `float` values. ```. 38 .100 2 .110 4 .120 5 .130 3 ``` – Roy Jan 10 '22 at 17:00
  • I removed the conversion. What do you want to do with this kind of values? – Corralien Jan 10 '22 at 17:06
  • I am curious how 38 `.` came. I can remove them using ```['result'] != '.'``` though. – Roy Jan 10 '22 at 17:16
  • Can you update your post with this data. Without formatting, I don't understand the structure. – Corralien Jan 10 '22 at 17:18