2

I have a dataframe that looks like:

import pandas as pd
df1 = pd.DataFrame({'Counterparty':['Bank','Client','Bank','Bank'],
             'Amount':[100, 100, 100, 100]
            })

I want to have a flexible selection that uses a string to filter out certain records. Something like:

condition ="'Bank'"
result = "df1['Counterparty'].isin(["+condition+"])"
print(eval(result))

I am struggling how to select all records that are NOT with Counterparty =Bank . ie

condition ="Not("Bank"))"

Any suggestions how I can achieve this ?

jpp
  • 159,742
  • 34
  • 281
  • 339
Number Logic
  • 852
  • 1
  • 9
  • 19

4 Answers4

2

You can use the unary operation Not (~) as follows

condition ="'Bank'"
result = "df1[~df1['Counterparty'].isin(["+condition+"])]"
print(eval(result))

the output will be:

   Amount Counterparty
1     100       Client

## EDIT

condition ="NOT(Bank)"
def func(c):
     if c.startswith("NOT("):
         return df1[~df1['Counterparty'].isin([c[4:-1]])]
     else:    
         return df1[df1['Counterparty'].isin([c])]
result  = "func(condition)"
print(eval(result))
condition = "Bank"
print(eval(result))

   Amount Counterparty
1     100       Client

   Amount Counterparty
0     100         Bank
2     100         Bank
3     100         Bank
sgDysregulation
  • 4,309
  • 2
  • 23
  • 31
  • I want the result to be flexible. ie. the result should depend only on the condition. If I pass a Not condition , the result should reflect this. Similarly, If I just pass the condition Bank, it should only show me records with a counterparty of Bank – Number Logic Feb 18 '18 at 09:59
  • There is no direct way to do so, however. I've edited my post with a suggestion that's hack-y but would work – sgDysregulation Feb 19 '18 at 01:14
  • 1
    the if condition seems to be the only way to get around the problem ! thanks – Number Logic Feb 20 '18 at 09:05
1

First using eval is bad practice.

What about use query?

condition ="Bank"
result = df1.query('Counterparty not in @condition')
print (result)
   Amount Counterparty
1     100       Client
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I want the "not" to be part of "condition". In this way I can flexibly change the filter criteria without having to change code – Number Logic Feb 18 '18 at 09:57
  • I think it is problem. Because `condition` variable is not condition, but filtered string. So `condition ="Not("Bank"))"` cannot be used. – jezrael Feb 18 '18 at 10:02
  • because condition is all string `"df1['Counterparty'].isin(["+condition+"])"` – jezrael Feb 18 '18 at 10:03
  • df.query uses df.eval under the hood, see the [source](https://github.com/pandas-dev/pandas/blob/6b27de318619ab7524e24976c31adf20e59c25f5/pandas/core/frame.py#L4479-L4498) – Aaron Mar 16 '23 at 15:11
1

Pandas has string operations that can be accessed by using "str".

import pandas as pd
df1 = pd.DataFrame({'Counterparty':['Bank','Client','Bank','Bank'],
             'Amount':[100, 100, 100, 100]
            })

to find bank you can do this.

df1.Counterparty.str.match('Bank')

if you want to find anything but, add a tilde:

~df1.Counterparty.str.match('Bank')

Finally, if you want the data then just put this boolean query in

df1[~df1.Counterparty.str.match('Bank')]

A few other notes. Don't try to build expressions that need to be evaluated (just forget eval is there for a while)

Back2Basics
  • 7,406
  • 2
  • 32
  • 45
0

Do not use eval in this way, it poses a security risk.

There are at least 2 viable options:

  1. Use df.query, as explained by @jezrael.
  2. Use a 3rd party library like pandasql, which will let you apply SQL-like queries, e.g. pysqldf("SELECT * FROM df WHERE Counterparty <> 'bank';")
jpp
  • 159,742
  • 34
  • 281
  • 339