2

I'm trying to build an entropy function from the scratch as asked by my leader. I have a dataset Ttrain, with many variables, sex being one. I'm having to extract the categories(male and female), and then calculate the probabilities and entropy subsequently, in a loop using the following code:

def entropy3(c):
    import math
    u=c.unique()
    a=[]
    b=[]
    z=[]
    for i in range(len(u)):
        a=Ttrain[(c==u[i]) & (Ttrain.survived==1)].survived.count()
        b=Ttrain[(c==u[i]) & (Ttrain.survived==0)].survived.count()
        p=a/(a+b)
        q=b/(a+b)
        z=-(p)*math.log(p,2)-(q)*math.log(q,2)
        return z

Now, when I run print(entropy3(Ttrain.sex)), I get 0.85, which is the entropy for the category female. Which means the loop does not iterate to the other category. Will be grateful if somebody could point out where am I going wrong. I'm very new to programming so please excuse any conceptual errors.

IndigoChild
  • 842
  • 3
  • 11
  • 29
  • Please make sure the indentation of your code is correct. An unconditional `return` in a loop is certainly not correct. I suppose it should be *after* the loop, not *in*. Secondly, `z` is initialised as a list while in the loop it gets numerical values. Does not look right. As `z` is never used in any expression, its assignment should also be moved out of the loop. The same goes for `p` and `q`, ... so there must be a lot wrong there... – trincot Jan 19 '18 at 18:12
  • You return at the end of the first iteration. Pull the return outside the loop. Return a list of values, rather than only the one. – Prune Jan 19 '18 at 18:13
  • Hi, if I keep the return outside the loop, where should I store the output of the entropy formula? ( I get an error after doing so) – IndigoChild Jan 19 '18 at 18:27
  • Hi @trincot, I'm keeping z inside the loop because I need to calculate the entropy(as per the formula) for each of the categories. I don't know how else to do it. Kindly suggest an alternative if you could. – IndigoChild Jan 19 '18 at 18:40

1 Answers1

0

A return statement (if present) is the last statement that gets executed in a function. So, as soon as it returns the value for the female category, the control exits the function. Since your return statement is inside the for loop, the next category doesn't get processed. You can move the return outside the for loop and have a list to store each value your want to return.

Ram
  • 97
  • 1
  • 10
  • Hi, after keeping return outside the loop, I now am getting the output for the second category instead of both. I'm not exactly sure how to store each value in a list. If you could kindly show me using the code depicted. – IndigoChild Jan 19 '18 at 18:31
  • Solved it, Thanks. – IndigoChild Jan 19 '18 at 19:24