dictionary keys to replace strings in pandas dataframe column with dictionary values and perform evaluate

Question

I have a pandas dataframe:

df = pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
                            '4 a, 4 ab, 1 b, 1 d',
                            np.nan] })

and a dictionary

di = {'a': 10.0,
 'ab': 2.0,
    'b': 1.5,
    'd': 1.0,
    np.nan: 0.0}

Using values from the dictionary, I want to evaluate the dataframe rows like this:

3*10.0 + 3*2.0 + 1*1.5 giving me a final output that looks like this:

pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
                            '4 a, 4 ab, 1 b, 1 d',
                            'np.nan'], 'result': [37.5,
                            50.5,
                            0]  })

So, far I could only replace ',' by '+'

df['col1'].str.replace(',',' +').str.split(' ')

score 2 · Accepted Answer · answered Jul 18 '19 at 16:08

2

Here is on way seem over kill

df['col1'].str.split(', ',expand=True).replace({' ':'*','np.nan':'0'},regex=True).\
     stack().apply(lambda x : eval(x,di)).sum(level=0)
Out[884]: 
0    37.5
1    50.5
2     0.0
dtype: float64

answered Jul 18 '19 at 16:08

BENY

317,841
20
164
234

score 2 · Answer 2 · answered Jul 18 '19 at 16:29

comprehension

from functools import reduce
from operator import mul

def m(x): return di.get(x, x)

df.assign(result=[
    sum(
        reduce(mul, map(float, map(m, s.split())))
        for s in row.split(', ')
    ) for row in df.col1
])

                  col1  result
0       3 a, 3 ab, 1 b    37.5
1  4 a, 4 ab, 1 b, 1 d    50.5
2               np.nan     0.0

score 1 · Answer 3 · answered Jul 18 '19 at 16:07

We first explode your string to rows seperated by a comma, using this function.
Then we split the values by a whitespace (' ') to seperate columns.
Finally we map your dictionary to the letters and do a groupby.sum:

new  = explode_str(df.dropna(), 'col1', ',')['col1'].str.strip().str.split(' ', expand=True).append(df[df['col1'].isna()])

s = new[1].map(di) * pd.to_numeric(new[0])

df['result'] = s.groupby(s.index).sum()

Output

                  col1  result
0       3 a, 3 ab, 1 b    37.5
1  4 a, 4 ab, 1 b, 1 d    50.5
2                  NaN     0.0

Function used from linked answer:

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})

score 0 · Answer 4 · answered Jul 18 '19 at 16:04

First, you can insert the keys from your dictionary:

for key in di.keys():
    df['col1'] = df['col1'].str.replace(key, '*' + str(di[key]))

Then you can insert the operators:

df['col1'] = df['col1'].str.replace(',', '+')

And then you evaluate:

df['result'] = df['col1'].apply(eval)

Please note that this might do something funny with the np.nan because there is an a inside that is picked up by the dictionary. There you might want to create some other rules beforehand like:

df['col1'] = df['col1'].str.replace('np.nan', '0')

this assumes no overlap of the string characters of dictionary keys but two of my dictionary keys are 'a' and 'ab'. — user0000, Jul 18 '19 at 16:40

score 0 · Answer 5 · answered Jul 18 '19 at 17:12

Start from defining a function replacing a source string with a corresponding value from di (will be applied soon):

def repl(src):
    return di[src] if src in di else 0.0

Then create an auxiliary DataFrame:

Extract all pairs of regex groups:
- n1 - a sequence of digits,
- n2 - a sequence of letters or dots (to match also np.nan).
Change n1 column to numeric.
Apply repl function to n2, to replace e.g. 'a' (a string) with 10.0 (a number).

The code is:

df2 = df.col1.str.extractall(r'(?P<n1>\d+)? ?(?P<n2>[a-z.]+)').fillna('0')
df2.n1 = pd.to_numeric(df2.n1)
df2.n2 = df2.n2.apply(repl)

and it gives the following result:

         n1    n2
  match          
0 0       3  10.0
  1       3   2.0
  2       1   1.5
1 0       4  10.0
  1       4   2.0
  2       1   1.5
  3       1   1.0
2 0       0   0.0

And to get the sum of products, for each group, run:

df2.groupby(level=0).apply(lambda gr: gr.product(axis=1).sum())

The result is:

0    37.5
1    50.5
2     0.0

dictionary keys to replace strings in pandas dataframe column with dictionary values and perform evaluate

5 Answers5

comprehension