1

I need to perform conditional calculations on 2 columns. The rules are the same. I have been using two functions and applying them to each column, as shown below.

enter image description here

df = pd.DataFrame({'Min': [50, 50], 
                   'Max' : [150, 150],
                   'Rule': ['A', 'B']})    

def adjust_min(row):
         if row['Rule'] == 'A':
             return row['Min'] * 5
         elif row['Rule'] == 'B':
             return row['Min'] * 10
         else:
             return row['Min']

def adjust_max(row):
     if row['Rule'] == 'A':
         return row['Max'] * 5
     elif row['Rule'] == 'B':
         return row['Max'] * 10
     else:
         return row['Max']

df['Min'] = df.apply(adjust_min, axis=1)

Ideally, I would want a function that applies to both columns, perhaps:

 if row['Rule'] == 'A':
           return row * 5  

Is there a more efficient way to do this? Thank you!

jpp
  • 159,742
  • 34
  • 281
  • 339
SAKURA
  • 937
  • 11
  • 29
  • I noticed you are using function definitions to return a "column" of a dataframe, but you do not pass in a dataframe. Are you applying this function to the dataframe? Is there more to this code that you are not displaying? Also, could you please include a sample dataframe as text (not picture) as part of a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve)? – BenG Jul 06 '18 at 14:28

3 Answers3

1

Applying Bill's approach to your problem:

import pandas as pd


def multi_func(f_dict):
    def f(row):
        return f_dict[row.name](row)
    return f

df = pd.DataFrame({'Min': [50, 50],
               'Max': [150, 150],
               'Rule': ['A', 'B']}) 
df = df.set_index('Rule')


result = df.apply(multi_func({'A': lambda x: x * 5, 'B': lambda x: x * 10}), axis=1)

results in:

       Max  Min
Rule           
A      750  250
B     1500  500
chuni0r
  • 173
  • 4
1

Vectorised, you can use pd.DataFrame.multiply together with a dictionary mapping. This will be more efficient as it utilizes the contiguous memory block feature of the NumPy arrays behind a Pandas dataframe. pd.DataFrame.apply is just a thinly veiled loop, which could be more appropriately applied to a list rather than a dataframe.

df = pd.DataFrame([[50, 150, 'A'],
                   [50, 150, 'B']],
                  columns=['Min', 'Max', 'Rule'])

# define dictionary mapping rule to factor
factors_map = {'A': 5, 'B': 10}

# create series of factors mapped from Rule
factors = df['Rule'].map(factors_map).fillna(1)

# multiply selected columns by factors
cols = ['Min', 'Max']
df[cols] = df[cols].multiply(factors, axis=0)

print(df)

   Min   Max Rule
0  250   750    A
1  500  1500    B
jpp
  • 159,742
  • 34
  • 281
  • 339
0

try the following:

if row['Rule'] == 'A':
    row[["Min", "Max"]] *= 5
elif row['Rule'] == 'B':
    row[["Min", "Max"]] *= 10

cheers

  • Can you demonstrate in full how your code would work? I keep running into errors. I appreciate your help! – SAKURA Jul 06 '18 at 15:05