2

I have a df:

df = pd.DataFrame([[1,1],[3,4],[3,4]], columns=["a", 'b'])
    a   b
0   1   1
1   3   4
2   3   4

I have to filter this df based on a query. The query can be complex, but here I'm using a simple one:

items = [3,4]
df.query("a in @items and b == 4")
    a   b
1   3   4
2   3   4

Only to these rows I would like to add some values in new columns:

configuration = {'c': 'action', "d": "non-action"}
for k, v in configuration.items():
    df[k] = v

The rest of the rows should have an empty value or np.nan. So my end df should look like:

    a   b   c       d
0   1   1   np.nan  np.nan
1   3   4   action  non-action
2   3   4   action  non-action

The issue is that to do the query I end up with a copy of a dataframe. And then I have to somehow merged them and replace the modified rows by index. How to do it without replacing in the original df the rows by index with the queried one?

Claudiu Creanga
  • 8,031
  • 10
  • 71
  • 110
  • What don't you do import numpy as np and then `configuration = {'c': np.nan, 'action', 'action', "d": np.nan , "non-action", "non-action"}}`, sorry if i don't get correctly. – Karn Kumar Nov 20 '18 at 17:02
  • @pygo sorry, maybe I wasn't explicit enough. I may have thousands of rows and in configuration just 2 columns with their values configuration = {'c': 'action', "d": "non-action"}. So I have to add those values to a filtered df of thousands of rows, and the other leave them blank or np.nan – Claudiu Creanga Nov 20 '18 at 17:06

1 Answers1

3

Using combine_first with assign

df.query("a in @items and b == 4").assign(**configuration).combine_first(df)
Out[138]: 
     a    b       c           d
0  1.0  1.0     NaN         NaN
1  3.0  4.0  action  non-action
2  3.0  4.0  action  non-action
BENY
  • 317,841
  • 20
  • 164
  • 234