6

I have a pandas dataframe:

df = pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
                            '4 a, 4 ab, 1 b, 1 d',
                            np.nan] })

and a dictionary

di = {'a': 10.0,
 'ab': 2.0,
    'b': 1.5,
    'd': 1.0,
    np.nan: 0.0}

Using values from the dictionary, I want to evaluate the dataframe rows like this:

3*10.0 + 3*2.0 + 1*1.5 giving me a final output that looks like this:

pd.DataFrame({'col1': ['3 a, 3 ab, 1 b',
                            '4 a, 4 ab, 1 b, 1 d',
                            'np.nan'], 'result': [37.5,
                            50.5,
                            0]  })

So, far I could only replace ',' by '+'

df['col1'].str.replace(',',' +').str.split(' ')
user0000
  • 77
  • 6

5 Answers5

2

Here is on way seem over kill

df['col1'].str.split(', ',expand=True).replace({' ':'*','np.nan':'0'},regex=True).\
     stack().apply(lambda x : eval(x,di)).sum(level=0)
Out[884]: 
0    37.5
1    50.5
2     0.0
dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
2

comprehension

from functools import reduce
from operator import mul

def m(x): return di.get(x, x)

df.assign(result=[
    sum(
        reduce(mul, map(float, map(m, s.split())))
        for s in row.split(', ')
    ) for row in df.col1
])

                  col1  result
0       3 a, 3 ab, 1 b    37.5
1  4 a, 4 ab, 1 b, 1 d    50.5
2               np.nan     0.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1
  1. We first explode your string to rows seperated by a comma, using this function.

  2. Then we split the values by a whitespace (' ') to seperate columns.

  3. Finally we map your dictionary to the letters and do a groupby.sum:

new  = explode_str(df.dropna(), 'col1', ',')['col1'].str.strip().str.split(' ', expand=True).append(df[df['col1'].isna()])

s = new[1].map(di) * pd.to_numeric(new[0])

df['result'] = s.groupby(s.index).sum()

Output

                  col1  result
0       3 a, 3 ab, 1 b    37.5
1  4 a, 4 ab, 1 b, 1 d    50.5
2                  NaN     0.0

Function used from linked answer:

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})
Erfan
  • 40,971
  • 8
  • 66
  • 78
0

First, you can insert the keys from your dictionary:

for key in di.keys():
    df['col1'] = df['col1'].str.replace(key, '*' + str(di[key]))

Then you can insert the operators:

df['col1'] = df['col1'].str.replace(',', '+')

And then you evaluate:

df['result'] = df['col1'].apply(eval)

Please note that this might do something funny with the np.nan because there is an a inside that is picked up by the dictionary. There you might want to create some other rules beforehand like:

df['col1'] = df['col1'].str.replace('np.nan', '0')
Carsten
  • 2,765
  • 1
  • 13
  • 28
  • this assumes no overlap of the string characters of dictionary keys but two of my dictionary keys are 'a' and 'ab'. – user0000 Jul 18 '19 at 16:40
0

Start from defining a function replacing a source string with a corresponding value from di (will be applied soon):

def repl(src):
    return di[src] if src in di else 0.0

Then create an auxiliary DataFrame:

  • Extract all pairs of regex groups:
    • n1 - a sequence of digits,
    • n2 - a sequence of letters or dots (to match also np.nan).
  • Change n1 column to numeric.
  • Apply repl function to n2, to replace e.g. 'a' (a string) with 10.0 (a number).

The code is:

df2 = df.col1.str.extractall(r'(?P<n1>\d+)? ?(?P<n2>[a-z.]+)').fillna('0')
df2.n1 = pd.to_numeric(df2.n1)
df2.n2 = df2.n2.apply(repl)

and it gives the following result:

         n1    n2
  match          
0 0       3  10.0
  1       3   2.0
  2       1   1.5
1 0       4  10.0
  1       4   2.0
  2       1   1.5
  3       1   1.0
2 0       0   0.0

And to get the sum of products, for each group, run:

df2.groupby(level=0).apply(lambda gr: gr.product(axis=1).sum())

The result is:

0    37.5
1    50.5
2     0.0
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41