2

Say I have a pandas dataframe and a dictionary as defined below:

import pandas as pd df = pd.DataFrame( { "c1": np.array(['a','a','b','b','a']) , "c2" : np.array([1,2,2,2,2])} )

  c1  c2
0  a   1
1  a   2
2  b   2
3  b   2
4  a   2

to_keep = {'a':[1],'b':[2,3]}

{'a': [1], 'b': [2, 3]}

I want to keep those elements where both the key and one of the values of to_keep is true. In other words, I want to get the following dataframe:

  c1  c2
0  a   1
2  b   2
3  b   2

I have tried many things, like df[(df["c1"] in to_keep.keys) and df["c2"] in to_keep["c1"]], but the thing is the I cannot pass the correct argument to the to_keep dict to get the appropriate value. I have thought of making a list of all possible combinations of c1 and c2, but that may be a bit inefficient regarding the size of dataset I have.

Any suggestions?

Guido
  • 6,182
  • 1
  • 29
  • 50

2 Answers2

2

Try to transform to_keep in a dataframe and then merge it with the original, like in Compare Python Pandas DataFrames for matching rows

Community
  • 1
  • 1
lib
  • 2,918
  • 3
  • 27
  • 53
1

Fleshing out lib's suggestion:

import pandas as pd
import numpy as np
df = pd.DataFrame( 
    { "c1": np.array(['a','a','b','b','a']) , 
      "c2": np.array([1,2,2,2,2])} )
to_keep = {'a':[1],'b':[2,3]}
to_keep = pd.DataFrame([(key, item) for key, val in to_keep.items() for item in val], 
                       columns=['c1', 'c2'])
#   c1  c2
# 0  a   1
# 1  b   2
# 2  b   3

print(pd.merge(df, to_keep, how='inner'))

yields

  c1  c2
0  a   1
1  b   2
2  b   2
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Looks good. But is it (most) efficient? What if the number of combinations of `a` and `b` is very large? – Guido Dec 18 '15 at 08:02