1

I have a pandas data frame that looks like

  Run      Time  ...           K Recovery          Ca Recovery
0  14  05:02:54  ...  61,498.671 (492.0%)  62,095.863 (496.8%)
1  19  08:14:59  ...  63,576.997 (508.6%)  63,986.691 (511.9%)
2  35  10:30:42  ...  63,609.755 (508.9%)  64,400.180 (515.2%)

I want it to only keep the percentages and remove everything that isn't numeric so that it looks like this:

  Run      Time  ...  K Recovery   Ca Recovery
0  14  05:02:54  ...  492.0        496.8
1  19  08:14:59  ...  508.6        511.9
2  35  10:30:42  ...  508.9        515.2

I was able to isolate the percentages by adding this re.findall(r'\(.*?\)', CaRecovery) function to each individual string when I was creating the lists that make up my pandas data base, however this gave me some odd formatting problems:

  Run      Time Be Recovery  ... Al Recovery  K Recovery Ca Recovery
0  14  05:02:54   [(98.2%)]  ...  [(487.1%)]  [(492.0%)]  [(496.8%)]
1  19  08:14:59  [(101.6%)]  ...  [(499.8%)]  [(508.6%)]  [(511.9%)]
2  35  10:30:42  [(101.5%)]  ...  [(502.9%)]  [(508.9%)]  [(515.2%)]

It added the square brackets around the parentheses, and now for some reason

df = df.replace(r'[%]', '', regex=True)

has no effect on the database.

I need it just to be the numbers so I can convert the columns into floats.

Julian Avila
  • 105
  • 10
  • 1
    You should add grouping parentheses in your regex expression to eliminate the parenthesis characters. Also you must be processing the result incorrectly if you are getting brackets around the extracted result. – RufusVS Apr 23 '21 at 19:19
  • 1
    The square brackets probably are lists of strings since `re.findall()` returns a list. You probably need to further subscript it to get the strings e.g. `result = re.findall(...` then you use `result[0]` to get the strings. Also, you can't get the df.replace works since the strings are within lists. – SeaBean Apr 23 '21 at 19:24
  • 1
    A [complete and reproducible dataframe with an expected output](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) would have been nice – anky Apr 23 '21 at 19:30
  • 1
    @anky yes sorry, that is a good idea, I will start including that in my future pandas questions – Julian Avila Apr 23 '21 at 19:39
  • @JulianAvila happy that you would consider this. Cheers..!! – anky Apr 23 '21 at 19:41
  • @SeanBean yes you were right the square brackets were from the 're.findall ' returning a list, and now it makes sense as to why I couldn't get 'df.replace' to work. – Julian Avila Apr 23 '21 at 19:42

2 Answers2

4

Try:

df.update(
    df.filter(regex=r"Recovery$").apply(
        lambda x: x.str.extract(r"\(([\d.-]+)%\)", expand=False)
    )
)
print(df)

Prints:

   Run      Time K Recovery Ca Recovery
0   14  05:02:54      492.0       496.8
1   19  08:14:59      508.6       511.9
2   35  10:30:42      508.9       515.2
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

Using groupings and pulling response out of the returned list:

import re

test_string = r"62,095.863 (496.8%)"

pattern = r"\(([^%]*)%?\)"  # without %
print(re.findall(pattern, test_string)[0])

pattern2 = r"\((.*)\)"  # with %
print(re.findall(pattern2, test_string)[0])
RufusVS
  • 4,008
  • 3
  • 29
  • 40