1

i would like to use strings of "commands" to automate some pandas df analysis.

something like:

str_a = 'df.col_1.isna()'   
str_b = ' & '    
str_c = 'df.col_2.isna()'    
str_tot = str_a + str_b + str_c

then use the string to sort out NaN rows in df:

df.loc[str_tot, :]

this last should be equal to:

df.loc[df.col_1.isna() & df.col_2.isna(), :]

but the python compiler reads the str_tot as a [list] and not as a string, returning error.

is there a way to circumvent this?


thx a lot

I am used to this in VBA a lot to build SQL strings... is it a wrong idea into python?

DYZ
  • 55,249
  • 10
  • 64
  • 93
CRAZYDATA
  • 135
  • 7
  • May I ask what is the reason for such an unorthodox design? Where are the strings coming from? Are they typed by the user, read from a file, or something else? – DYZ May 14 '19 at 06:24
  • You could take a look at [eval()](https://docs.python.org/3/library/functions.html#eval), but it could be the wrong tool for the job – razdi May 14 '19 at 06:27
  • Hi DYZ, it' is just to build a bit of automation into my data analysis...I build the string on one side and then i feed-up a small function... why you say it is unorthodox? what would be the "orthodox"? thx a lot – CRAZYDATA May 14 '19 at 11:34

2 Answers2

3

Close, what you need is DataFrame.query, but if need working with Series.isna function add parameter engine='python':

df = pd.DataFrame({
        'A':list('abcdef'),
         'col_1':[np.nan,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'col_2':[np.nan,3,5,7,1,np.nan],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df)
   A  col_1  C  col_2  E  F
0  a    NaN  7    NaN  5  a
1  b    5.0  8    3.0  3  a
2  c    4.0  9    5.0  6  a
3  d    5.0  4    7.0  9  b
4  e    5.0  2    1.0  2  b
5  f    4.0  3    NaN  4  b

str_a = 'col_1.isna()'

str_b = ' & '

str_c = 'col_2.isna()'

str_tot = str_a + str_b + str_c
print (str_tot)
col_1.isna() & col_2.isna()

print (df.query(str_tot, engine='python'))
   A  col_1  C  col_2  E  F
0  a    NaN  7    NaN  5  a

Another idea is use fact np.nan == np.nan is False, docs:

str_a = '(col_1 != col_1)'

str_b = ' & '

str_c = '(col_2 != col_2)'

str_tot = str_a + str_b + str_c
print (str_tot)
(col_1 != col_1) & (col_2 != col_2)

print (df.query(str_tot))
   A  col_1  C  col_2  E  F
0  a    NaN  7    NaN  5  a
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • hi thx a lot. this is close to what i was looking for. wonder if there is a more general function. Mine was just an example: I was looking to build a much more complex string. Also: this works for querying the rows, what about the columns? i.e assume I want to loc a df on certain rows (now I know how to do it, thx ) to change the value into the cells of a defined column. is there a way to "query" selecting the columns ( ? df[col_x].query(string)= '123' ? ) THX a lot - – CRAZYDATA May 14 '19 at 11:30
  • 1
    @BusinessExploration - Not easy, what you need. Maybe you can check [this](https://stackoverflow.com/questions/53779986/dynamic-expression-evaluation-in-pandas-using-pd-eval), but generally some univeersal solution maybe is possible, but really complicated. – jezrael May 14 '19 at 11:53
  • jezrael... thx a lot, I think you pointed into the right direction. I give a small view of the link you provided and the concept is there... even if I am quite surprised to see such a mole of info. using strings to build queries it's normal practice in VBA ( [DoCmd.RunSQL](https://learn.microsoft.com/en-us/office/vba/api/access.docmd.runsql) ) ... I need to study... – CRAZYDATA May 14 '19 at 12:14
  • @BusinessExploration - ya, each language is a bit different. :) – jezrael May 14 '19 at 12:16
0

Alternative way you may use python eval.

str_a = 'df.col_1.isna()'
e = eval(str_a)
print(e)

For the upper example:

df = pd.DataFrame({
        'A':list('abcdef'),
         'col_1':[np.nan,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'col_2':[np.nan,3,5,7,1,np.nan],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df)
   A  col_1  C  col_2  E  F
0  a    NaN  7    NaN  5  a
1  b    5.0  8    3.0  3  a
2  c    4.0  9    5.0  6  a
3  d    5.0  4    7.0  9  b
4  e    5.0  2    1.0  2  b
5  f    4.0  3    NaN  4  b

Will return

0     True
1    False
2    False
3    False
4    False
5    False
prosti
  • 42,291
  • 14
  • 186
  • 151