0

I want to extract numbers from strings like. They appear in many columns so what is the most efficient way to remove these strings and get only the numbers? Is there a way other than using regex

Ilovenoodles
  • 83
  • 1
  • 7

2 Answers2

4

Assuming you expect only one number per column, you could try using str.extract here:

df["some_col"] = df["some_col"].str.extract(r'(\d+(?:\.\d+)?)')
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Hi Tim, What if I have 32 columns that need to be modified like this? Is there a more efficient way to do this? – Ilovenoodles Apr 06 '21 at 01:54
  • @Ilovenoodles Reference [this accepted answer](https://stackoverflow.com/questions/46623596/pandas-dataframe-replace-string-in-multiple-columns-by-finding-substring). You may use `str.extract` on multiple columns, by passing a list of columns. – Tim Biegeleisen Apr 06 '21 at 02:00
0

I would use a function with regex that matches the pattern of what you are seeing. Since you tagged pandas and dataframe I am assuming you are working with a dataframe but a sample output would certainly help. Here is how I would tackle it:

import pandas as pd
import numpy as np
import re

def extract_numbers (column1: str):
  result = np.nan
  for x in column1.split():
    if re.search(r'\d+\.?\d+', x)
      result = float(re.search(r'\d+\.?\d+', x).group())

    if pd.notnunll(result):
      return result

df['Numbers'] = df['YourColumn'].apply(extract_numbers)

The result of this function would be a new column called "Numbers" that contains the extracted number from each string. It will return NaN when a number is not found (or matched to). Once you have a column with the number value from each string you can interact with it however you please.

git_rekt
  • 54
  • 3