I want to extract numbers from strings like. They appear in many columns so what is the most efficient way to remove these strings and get only the numbers? Is there a way other than using regex
Asked
Active
Viewed 324 times
0
-
1Add the sample output. – Amit Vikram Singh Apr 05 '21 at 23:44
-
1Its always important to add 3 simple things in your question. 1st- Samples of input, 2nd- Samples of output and 3rd- your efforts in form of code, kindly do add these in your question to make it clear, thank you. – RavinderSingh13 Apr 05 '21 at 23:47
-
1@RavinderSingh13 Thanks for letting me know. I've added sample input and outputs. – Ilovenoodles Apr 06 '21 at 01:04
-
@AmitVikramSingh Thanks for letting me know. I've added them. – Ilovenoodles Apr 06 '21 at 01:04
2 Answers
4
Assuming you expect only one number per column, you could try using str.extract
here:
df["some_col"] = df["some_col"].str.extract(r'(\d+(?:\.\d+)?)')

Tim Biegeleisen
- 502,043
- 27
- 286
- 360
-
Hi Tim, What if I have 32 columns that need to be modified like this? Is there a more efficient way to do this? – Ilovenoodles Apr 06 '21 at 01:54
-
@Ilovenoodles Reference [this accepted answer](https://stackoverflow.com/questions/46623596/pandas-dataframe-replace-string-in-multiple-columns-by-finding-substring). You may use `str.extract` on multiple columns, by passing a list of columns. – Tim Biegeleisen Apr 06 '21 at 02:00
0
I would use a function with regex that matches the pattern of what you are seeing. Since you tagged pandas and dataframe I am assuming you are working with a dataframe but a sample output would certainly help. Here is how I would tackle it:
import pandas as pd
import numpy as np
import re
def extract_numbers (column1: str):
result = np.nan
for x in column1.split():
if re.search(r'\d+\.?\d+', x)
result = float(re.search(r'\d+\.?\d+', x).group())
if pd.notnunll(result):
return result
df['Numbers'] = df['YourColumn'].apply(extract_numbers)
The result of this function would be a new column called "Numbers" that contains the extracted number from each string. It will return NaN when a number is not found (or matched to). Once you have a column with the number value from each string you can interact with it however you please.

git_rekt
- 54
- 3