0

I am trying to create a function which creates new variables in a DataFrame based on multi-level variable grouping. The below patch satisfactorily produce the desired result, however the execution time is on the higher side for a run-time environment.

Can anyone suggest a better alternative ?

def prob_calc(test1):
    test1["log_sum"] = np.log(sum(np.exp(test1.utility)))
    test1["prob_logit_within_nest"] = np.exp(test1.utility)/sum(np.exp(test1.utility))
    test1["Freq_of_nest"] = sum(test1.flag)
    return test1
test1 = test1.groupby(['quest_number','task','nest']).apply(prob_calc)

Thanks, Sombit

miradulo
  • 28,857
  • 6
  • 80
  • 93
  • 1
    Can you provide sample desired inputs and outputs? – languitar Mar 02 '17 at 13:56
  • The input for the function is a data-frame (test1). It has the variables like 'utility', 'flag', 'quest_number', 'task', 'nest'. The desired output should be the same data-frame with additional columns like 'log_sum', 'prob_logit_within_nest' & 'Freq_of_nest'. We need to group it by the listed grouping varibales. – Sombit Sarkar Mar 02 '17 at 15:01
  • 1
    Please provide this so we can easily copy and paste it for experimenting. Cf. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – languitar Mar 02 '17 at 15:26

0 Answers0