Indexing on multiple arguments that depend on each other in pandas dataframe

Question

Say I have a pandas dataframe and a dictionary as defined below:

import pandas as pd df = pd.DataFrame( { "c1": np.array(['a','a','b','b','a']) , "c2" : np.array([1,2,2,2,2])} )

to_keep = {'a':[1],'b':[2,3]}

{'a': [1], 'b': [2, 3]}

I want to keep those elements where both the key and one of the values of to_keep is true. In other words, I want to get the following dataframe:

I have tried many things, like df[(df["c1"] in to_keep.keys) and df["c2"] in to_keep["c1"]], but the thing is the I cannot pass the correct argument to the to_keep dict to get the appropriate value. I have thought of making a list of all possible combinations of c1 and c2, but that may be a bit inefficient regarding the size of dataset I have.

Any suggestions?

score 2 · Answer 1 · edited May 23 '17 at 11:53

2

Try to transform to_keep in a dataframe and then merge it with the original, like in Compare Python Pandas DataFrames for matching rows

edited May 23 '17 at 11:53

Community

1
1

answered Dec 17 '15 at 15:35

lib

2,918
3
27
53

score 1 · Accepted Answer · edited May 23 '17 at 12:31

1

Fleshing out lib's suggestion:

import pandas as pd
import numpy as np
df = pd.DataFrame( 
    { "c1": np.array(['a','a','b','b','a']) , 
      "c2": np.array([1,2,2,2,2])} )
to_keep = {'a':[1],'b':[2,3]}
to_keep = pd.DataFrame([(key, item) for key, val in to_keep.items() for item in val], 
                       columns=['c1', 'c2'])
#   c1  c2
# 0  a   1
# 1  b   2
# 2  b   3

print(pd.merge(df, to_keep, how='inner'))

yields

edited May 23 '17 at 12:31

Community

1
1

answered Dec 17 '15 at 15:46

unutbu

842,883
184
1,785
1,677

Looks good. But is it (most) efficient? What if the number of combinations of `a` and `b` is very large? – Guido Dec 18 '15 at 08:02

Indexing on multiple arguments that depend on each other in pandas dataframe

2 Answers2