2

I want to create a new dataframe in pandas using the apply function to two existing columns on a function I wrote.

Packages used:

import math 
import scipy.stats as st
import pandas as pd

The following function is used to calculate the lower bound of the Wilson score confidence interval :

def ci_lower_bound(wins, losses, a = 0.05):
    n = wins + losses
    if n == 0:
        return 0
    z = st.norm.ppf(1 - (1 - a) / 2)
    phat = 1.0 * wins / n
    lower = (phat - z * z / (2 * n) + z * math.sqrt( (phat*(1 - phat) + z /(4*n))/ n ))/(1 + z*z/n)
    return lower

I have a boxing dataset of persons A versus persons B where I have the wins/losses of persons A and B. The argument I want to use for the function are:

data['won_A'] #wins
data['lost_A'] #losses

I want to create a new column called, data['lower_bound_a'], using apply on the above function using the following line.

data['lower_bound_a'] =data.apply(ci_lower_bound, wins = 'won_A', losses = 'lost_A')

However, when I tried the above code, I got the following error message :

TypeError: ("ci_lower_bound() got multiple values for argument 'wins'", 'occurred at index age_A')
John
  • 23
  • 4

1 Answers1

0

Maybe this:

data['lower_bound_a']=data.apply(lambda x: ci_lower_bound(x['won_A'], x['lost_A']),axis=1)
print(data)
U13-Forward
  • 69,221
  • 14
  • 89
  • 114