0

I am trying to perform conditional logic with this dataframe.

In[5]: df = pd.DataFrame({'WINNER': [2, 2, 0], 'PREDICTED': [2, 1, 0], 'HOME': [5.25, 2.2, 1.25], 'DRAW': [4.5, 3.2, 5], 'AWAY': [1.53, 3.4, 8]})
In[6]: df
Out[6]: 
   AWAY  DRAW  HOME  PREDICTED  WINNER
0  1.53   4.5  5.25          2       2
1  3.40   3.2  2.20          1       2
2  8.00   5.0  1.25          0       0

Using the following rules, I want to calculate a new profit column.

In[14]: df.loc[(df["WINNER"] == df["PREDICTED"]) & (df["PREDICTED"] == 0), "PROFIT"] = df['HOME'] * 10
In[16]: df.loc[(df["WINNER"] == df["PREDICTED"]) & (df["PREDICTED"] == 1), "PROFIT"] = df['DRAW'] * 10
In[17]: df.loc[(df["WINNER"] == df["PREDICTED"]) & (df["PREDICTED"] == 2), "PROFIT"] = df['AWAY'] * 10

I almost got the correct result:

   AWAY  DRAW  HOME  PREDICTED  WINNER  PROFIT
0  1.53   4.5  5.25          2       2    15.3
1  3.40   3.2  2.20          1       2     NaN
2  8.00   5.0  1.25          0       0    12.5

Is it possible to simplify the code in line #14-17?

How can I get table below such that NaN => -10?

AWAY  DRAW  HOME  PREDICTED  WINNER  PROFIT
0  1.53   4.5  5.25          2       2    15.3
1  3.40   3.2  2.20          1       2     -10
2  8.00   5.0  1.25          0       0    12.5

EDIT: I like proposed by imp9 solution with little modifications

categories = ['HOME', 'DRAW', 'AWAY']
df['PROFIT'] = -10
for count, col in enumerate(categories):
   df.loc[df.query('WINNER == PREDICTED == @count').index, "PROFIT"] += df[col] * 10
SpanishBoy
  • 2,105
  • 6
  • 28
  • 51
  • Please check your modification, you are now setting everything to -10, and then adding your values. Your output dataframe is different. – ilyas patanam Jan 05 '16 at 06:52

2 Answers2

2
categories = ['HOME', 'DRAW', 'AWAY']
for count, col in enumerate(categories):
    df.loc[df.query('WINNER == PREDICTED == @count').index, "PROFIT"] = df[col] * 10
df.fillna({'PROFIT': -10}, inplace = True)
  • Use a for loop with enumerate to incrementally fill in the profit column.
  • Use df.query to further simplify writing the conditional logic. The query must be passed as a string and variables should be preceded with @.
  • df.fillna() lets you replace all the NaN values in the dataframe.

EDIT: used a dictionary so only Nan in PROFIT column is filled.

Community
  • 1
  • 1
ilyas patanam
  • 5,116
  • 2
  • 29
  • 33
  • categories = ['HOME', 'DRAW', 'AWAY'] df['PROFIT'] = -10 for count, col in enumerate(categories): df.loc[df.query('WINNER == PREDICTED == @count').index, "PROFIT"] += df[col] * 10 – SpanishBoy Jan 05 '16 at 06:42
  • I updated the description, because couldn't beauty code in comments. Thanks! – SpanishBoy Jan 05 '16 at 06:46
  • You're welcome. I have made edit so not all Nan will be filled. If this answer or any other one solved your issue, please mark it as accepted. There is no compulsion. – ilyas patanam Jan 05 '16 at 06:57
  • `df.loc[df.query('WINNER == PREDICTED == @count').index, "PROFIT"]` - will you not catch the error due absence of `PROFIT`-column? – SpanishBoy Jan 05 '16 at 07:03
  • Pandas will create a column for `PROFIT`. When you are trying to assign a value to a column that doesn't exist, pandas will create that column. – ilyas patanam Jan 05 '16 at 08:15
1

Consider numpy's np.where() to reflect a nested if/then/else. The false argument would be -10:

df["PROFIT"] = np.where((df["WINNER"] == df["PREDICTED"]) & (df["PREDICTED"] == 0), 
                         df['HOME'] * 10,
                 np.where((df["WINNER"] == df["PREDICTED"]) & (df["PREDICTED"] == 1), 
                          df['DRAW'] * 10,
                   np.where((df["WINNER"] == df["PREDICTED"]) & (df["PREDICTED"] == 2), 
                             df['AWAY'] * 10, -10)))
Parfait
  • 104,375
  • 17
  • 94
  • 125