1

I want to get a subset of a dataframe based upon multiple conditions with the number of conditions I pass it variable.

I have seen similar answers with multiple conditions(Select rows from a DataFrame based on values in a column in pandas) , but none that allow to pass less variables.

I have tried using: c=None, c=True, c=all, but it always evaluates to false

def Subset (df, a=None, b=None, c=True): 
    temp=df.loc[(df['a'] == a) & (df['b'] == b) & (df['c'] == c)]

    return (temp)

if I evaluate:

Subset=Subset(df=Table, a=350, b=300)

I get an empty dataframe

while if I use the function:

def Try(df, a=None, b=None): 
    temp=df.loc[(df['a'] == a) & (df['b'] == b)]

    return (temp)

I get a dataframe with 10 rows.

To answer Yaakov Bressler's comment I am giving more information: My dataframe looks like this:

files,Names,Curve Type,Thickness,Temperature,Number,Iteration,leak,start,stop,Vth,F_E_M,on/off
Output [(1) _250-300-G21_]0.csv,250-300-G21,Output,250,300,G21,0,True,,,,,
Output [(1) _250-300-G22_]0.csv,250-300-G22,Output,250,300,G22,0,False,,,,,
Transfer lin [(1) _250-300-G21_;]0.csv,250-300-G21,Transfer lin,250,300,G21,0,True,,,,,

the first column are the filenames. the other columns are data about the transitor that file represents.

I want to create a subset of this file representing a single transistor, defined by: ( Curve Type,Thickness,Temperature,Number ) or of a single chip : (Curve Type,Thickness,Temperature).

This is so that I can import them and do plots/analysis.

Leo
  • 1,176
  • 1
  • 13
  • 33
  • what is your `df['c']`? Is it actually `True/False`? – Quang Hoang May 07 '19 at 18:23
  • Chances are you just don't have any rows in which column 'a' = 350, 'b' = 300, and 'c'=True at the same time – Turtalicious May 07 '19 at 18:26
  • Maybe refactor to pass a dictionary of `column: criteria` pairs and build a mask in a loop over the dictionary items. – wwii May 07 '19 at 18:35
  • df['c'][0]='Saturation' So I see why df['c'][0]==True evaluates to False. However I want to be able to sometimes not pass it a c value and always evaluate df['c']==c to true – Leo May 07 '19 at 18:38
  • Something like .. https://stackoverflow.com/questions/34157811/filter-a-pandas-dataframe-using-values-from-a-dict – wwii May 07 '19 at 18:41
  • @wwii that could work, so I would create a filter_v in a for loop passing it only the values != None – Leo May 07 '19 at 19:06

2 Answers2

0

Seems like your dataframe's 'c' column is not a boolean. Try print(df['c'].dtype == 'bool')

Also, sharing your original df and the goal would help elucidate your problem.

Also, also, I would not name an object after a function as in the case of Subset=Subset(df=Table, a=350, b=300)

Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
0

In the end I did it this way:

    def Subset ( df, *arg, **kwargs):  
        ''' joins them in a query,
            TRANFROMS DATASET TO STR , and gives back subsett'''

        qry = ' & '.join(["{} == '{}'".format(key,value) for key,value in kwargs.items()])        
        df=df.astype(str)


        subset=df.query(qry)
        return (subset)

Leo
  • 1,176
  • 1
  • 13
  • 33