0

I have trouble with shortening my code with lambda if possible. bp is my data name.

My data looks like this:

user label 

1        b    

2        b

3        c

I expect to have

user   label  Y

1        b    1

2        b    1

3        c    0

Here is my code:

counts = bp['Label'].value_counts()
def score_to_numeric(x):
    if counts['b'] > counts['s']: 
        if x == 'b':
            return 1
        else: 
            return 0
    else:
        if x =='b':
            return 0
        else:
            return 1
bp['Y'] = bp['Label'].apply(score_to_numeric) # apply above function to convert data 

It is a function converting a categorical data 'b' or 's' in column named 'Label' into numeric data: 0 or 1. The line counts = bp['Label'].value_counts() counts the number of 'b' or 's' in column 'Label'. Then, in score_to_numeric, if the count of 'b' is more than 's', then give value 1 to b in a new column called 'Y', and vice versa.

I would like to shorten my code into 3-4 lines at most. I think perhaps using a lambda statement will do this, but I'm not familiar enough with lambdas.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • 1
    Forget shortening your code, for now. You're recomputing `bd['Label'].value_counts()` on *every call* to `score_to_numeric`. – user2357112 Oct 24 '17 at 23:18
  • 1
    I think showing us your data and expected output for about 5-10 rows would help more than showing a function and asking someone to optimise it. – cs95 Oct 24 '17 at 23:21

3 Answers3

0

Since True and False evaluate to 1 and 0, respectively, you can simply return the Boolean expression, converted to integer.

def score_to_numeric(x):
    return int((counts['b'] > counts['s']) == \
               (x == 'b'))

It returns 1 iff both expressions have the same Boolean value.

Prune
  • 76,765
  • 14
  • 60
  • 81
0

I don't think you need to use the apply method. Something simple like this should work:

value_counts = bp.Label.value_counts()
bp.Label[bp.Label == 'b'] = 1 if value_counts['b'] > value_counts['s'] else 0
bp.Label[bp.Label == 's'] = 1 if value_counts['s'] > value_counts['b'] else 0
Ankur Ankan
  • 2,953
  • 2
  • 23
  • 38
0

You could do the following

counts = bp['Label'].value_counts()
t = 1 if counts['b'] > counts['s'] else 0
bp['Y'] = bp['Label'].apply(lambda x: t if x == 'b' else 1 - t)
Jon Deaton
  • 3,943
  • 6
  • 28
  • 41