-1

I have a dataframe containing some input values, and I trying to evaluate a parameter conditioned on the values given in a column (example below).

What I would like to obtain is shown in the figure:enter image description here

How can I solve the issue below?

import numpy as np
import pandas as pd
 
df = pd.DataFrame.from_dict({
         'x': [0,1,2,3,4], 
         'y': [100,100,100,100,100],
         'z': [100,100,100,100,100],
         })
def evaluate(input):
    if input <=2:
        a=4
        b=6
    else:
        a=7
        b=8
    return df['x']*a+b*(df['y']+df['z'])

df['calc'] = evaluate(df['x'])
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> ~\AppData\Local\Temp/ipykernel_38760/3329748611.py in <module>
> 15         b=8
> 16     return df['x']*a+b*(df['y']+df['z'])
> 17 df['calc'] = evaluate(df['x'])
> 
> ~\AppData\Local\Temp/ipykernel_38760/3329748611.py in evaluate(input)
> 8     })
> 9 def evaluate(input):
> 10     if input <=2:
> 11         a=4
> 12         b=6
> 
> ~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
> 1535     @final
> 1536     def __nonzero__(self):
> 1537         raise ValueError(
> 1538             f"The truth value of a {type(self).__name__} is ambiguous. "
> 1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
> 
> ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
wwii
  • 23,232
  • 7
  • 37
  • 77
mamici24
  • 9
  • 2

2 Answers2

0

The problem is your function is not vectorizing properly over the array df['x'], specifically your if statement if input <=2: cannot be applied to the column df['x'] generally. The better way to do this is with a loop, or df.apply(). Since df.apply() is trickier with multiple columns involved, the easiest solution is probably just create a function with a simple for loop using df.itertuples().

def evaluate(df):
    calc_column = []
    for row in df.itertuples():
        if row.x <= 2:
            a = 4
            b = 6
        else:
            a = 7
            b = 8
        calc_column.append(row.x*a + b*(row.y + row.z))
    return calc_column
df['calc'] = evaluate(df)
df

    x   y   z   calc
0   0   100 100 1200
1   1   100 100 1204
2   2   100 100 1208
3   3   100 100 1621
4   4   100 100 1628

Here each row element can be accessed from the df.itertuples() method, allowing you to easily grab the values you need from each row.

idins23
  • 11
  • 3
0

Use your conditions to create a mask. Use the mask to filter the rows you want to operate on. Return a Series.

def evaluate(df):
    S = pd.Series(0,index=df.index)
    mask = df['x'] <= 2
    a,b = 4,6
    S[mask] = df.loc[mask,'x']*a + b*(df.loc[mask,['y','z']].sum(1))
    mask = df['x'] > 2
    a,b = 7,8
    S[mask] = df.loc[mask,'x']*a + b*(df.loc[mask,['y','z']].sum(1))
    return S
    
df['calc'] = evaluate(df)

Or just...

def evaluate(df):
    S = pd.Series(0,index=df.index)
    mask = df['x'] <= 2
    a,b = 4,6
    S[mask] = df.loc[mask,'x']*a + b*(df.loc[mask,['y','z']].sum(1))
    # mask = df['x'] > 2
    a,b = 7,8
    S[~mask] = df.loc[~mask,'x']*a + b*(df.loc[~mask,['y','z']].sum(1))
    return S
wwii
  • 23,232
  • 7
  • 37
  • 77