2

I have the following function as part of a script inspired by this:

def view(a='', b='', c=''):
    if a=='All' and b=='All' and c=='All': return df 
    if a=='All' and c=='All' and b!='All': return df[(df['b']==b)]
    if a!='All' and c=='All' and b=='All': return df[(df['a']==a)]
    if a=='All' and c!='All' and b=='All': return df[(df['c']==c)]
    if a=='All' and c!='All' and b!='All': return df[(df['c']==c) & (df['b']==b)]                                                        
    if a!='All' and c=='All' and b!='All': return df[(df['a']==a) & (df['b']==b)]                                                                     
    if a!='All' and c!='All' and b=='All': return df[(df['a']==a) & (df['c']==c)]                                                                 
    return df[(df['a']==a) & (df['b']==b) & (df['c']==c)]

Is there a nice way to write all those chained if statements with a nice pythonic expression. Bonus answer if generalized for n variables.

Note: Perhaps related to this question, but I still cant figure it out.

hernanavella
  • 5,462
  • 8
  • 47
  • 84

2 Answers2

0

Your function is basically doing this:

if all parameters are 'All':
    return df
else:
    Take all the non-'All' parameters
    Test if each one is equal to df['name_of_parameter']
    Bitwise-AND them together
    Return df[result of previous line]

Let's start our rewrite by first taking a list of all non-'All' parameters:

notall = [x for x in [a,b,c] if x != 'All']
if not notall:
    return df
else:
    ???

Roadblock #1: We've now lost track of which value goes with which parameter. Why do we need to know that? So that we can compare the parameters against the correct elements of df. We can fix this by storing not just the parameters' values but also their names in notall:

notall = [(x, name) for (x, name) in [(a, 'a'), (b, 'b'), (c, 'c')] if x != 'All']
if not notall:
    return df
else:
    ???

Writing out the name of each parameter twice is ugly, but it's either this or resorting to naughtiness with locals and/or **kwargs.

With that taken care of, the comparisons against the elements of df is easy:

 compared = [df[name] == x for (x, name) in notall]

Now, how do we AND them all together? We could use functools.reduce() and operator.and_, but (unless you've overloaded == to return a non-boolean, which I hope you didn't do), the elements of compared are all booleans, which means that combining them with bitwise AND is the same as combining them with logical AND, and Python already has a function for that: all().

return df[all(compared)]

Putting it all together:

def view(a='', b='', c=''):
    notall = [(x, name) for (x, name) in [(a, 'a'), (b, 'b'), (c, 'c')] if x != 'All']
    if not notall:
        return df
    else:
        compared = [df[name] == x for (x, name) in notall]
        return df[all(compared)]

or, even more compact:

def view(a='', b='', c=''):
    notall = [(x, name) for (x, name) in [(a, 'a'), (b, 'b'), (c, 'c')] if x != 'All']
    if not notall:
        return df
    else:
        return df[all(df[name] == x for (x, name) in notall)]

Now, about that naughtiness mentioned earlier: If all of the parameters are in a dict, then notall can just contain the keys, which will allow us to look up both the parameter values and df values without repeating ourselves (too much). How do we get all the parameters in a dict? With **kwargs:

def view(**kwargs):
    notall = [name for name in NAMES if kwargs.get(name, '') != 'All']

(Note the use of get to give the parameters their default values.) But what should NAMES be? It can't be kwargs.keys(), as that will only contain the parameters that the user passed in, which may not be all of them (and may even include keys we weren't expecting!). Option 1 is to write out a list of the parameter names somewhere and use that:

NAMES = ['a', 'b', 'c']

Alternatively, if the keys of df happen to be the same as the desired names of the function's parameters, we can just use df.keys():

    notall = [name for name in df.keys() if kwargs.get(name, '') != 'All']

or, slightly shorter:

    notall = [name for name in df if kwargs.get(name, '') != 'All']

After this, we just need to update how the elements of notall are used, changing this:

return df[all(df[name] == x for (x, name) in notall)]

to this:

return df[all(df[name] == kwargs.get(name, '') for name in notall)]

(Note that we still need to keep using get to set the default values.)

Putting it all back together again:

NAMES = ['a', 'b', 'c']
def view(**kwargs):
    notall = [name for name in NAMES if kwargs.get(name, '') != 'All']
    if not notall:
        return df
    else:
        return df[all(df[name] == kwargs.get(name, '') for name in notall)]

or, if the parameter names are the same as the keys of df:

def view(**kwargs):
    notall = [name for name in df if kwargs.get(name, '') != 'All']
    if not notall:
        return df
    else:
        return df[all(df[name] == kwargs.get(name, '') for name in notall)]

EDIT: Based on the comments below, apparently the values of df are something that overrides == so that it doesn't return a boolean. Fortunately, as alluded to above, this just requires changing this:

return df[all(df[name] == kwargs.get(name, '') for name in notall)]

to this:

import functools
import operator

return functools.reduce(operator.and_, [df[name] == kwargs.get(name, '') for name in notall])
jwodder
  • 54,758
  • 12
  • 108
  • 124
  • `df[all(compared)]` and its variants will just be `df[False]` or `df[True]`. That doesn't seem equivalent to the OP's `df[(df['a']==a) & (df['b']==b) & (df['c']==c)]`. – DSM Apr 20 '17 at 17:05
  • `df["a"] == a` will be a bool Series, and bitwise anding a few of them together will give a Series which is then used as a mask. – DSM Apr 20 '17 at 17:07
  • @DSM: Where does the OP say that `df["a"] == a` is not a `bool`? – jwodder Apr 20 '17 at 17:14
  • I'm confused about how the comparison returns a Series. In my experience, it tends to be either True (they are equal) or False (they are not equal). Similarly taking bitwise and of any python boolean values seems to result in a python boolean value. – Kenny Ostrom Apr 20 '17 at 17:15
  • 1
    @jwodder: unfort. you need to either know the OP's questions :-) or recognize df as the standard name for an arbitrary dataframe. The OP confirms that it's a dataframe when he writes "[...] a simple df with generic index and columns 'a', 'b' 'c'. I use this function as part of a script to filter the dataframe using a widget. 'a', 'b', 'c' are categorical variables." in the comments. – DSM Apr 20 '17 at 17:16
  • @KennyOstrom: comparisons between scalars and numpy arrays or pandas Series return vector results. – DSM Apr 20 '17 at 17:17
  • So you should probably just delete this answer, because of stuff the question didn't mention originally. – Kenny Ostrom Apr 20 '17 at 17:19
  • @KennyOstrom: Harsh. It's almost a one-line change. – jwodder Apr 20 '17 at 17:21
  • Ah I see where you're going. Since it overrides the operators, we just call them and let it do so. – Kenny Ostrom Apr 20 '17 at 17:28
0

This should do the trick:

import itertools, functools
from operator import eq, ne, and_

def view(*args):
    Eq, Ne = functools.partial(eq, 'All'), functools.partial(ne, 'All')

     if all(Eq(var) for var in args):
         return df 

    for cond, ret in itertools.product((Eq, Ne), len(args)):
        if all(fun(var) for var, fun in zip(args, cond)):
            index = functools.reduce(and_, (df[var] == var for var, fun in cond if fun == Ne))
            return df[index]

The only problem is that there's no simple way I'm aware of to know the name of the variable you're currently using. This is why I used df[var] == var.

This is relatively easy to fix by making each variable carry its name with it, for example. So, basically, each variable'll be a tuple a = (variable, "variable").

ForceBru
  • 43,482
  • 10
  • 63
  • 98