Pandas: In query(), how to lookup column value as a dict key?

Question

In my Pandas query(), I want to filter based on whether one field is in a set found in a dict lookup on another field. Here is some stripped down code illustrating my problem:

import pandas

cars = {
    'Japan': {'Honda', 'Nissan', 'Toyota'},
    'USA': {'Ford', 'Chevrolet', 'Tesla'},
}

data = pandas.DataFrame({
    'country': ['Japan', 'Japan', 'USA'],
    'make': ['Honda', 'Ford', 'Ford'],
    'date': ['2018-10-04', '2018-10-05', '2018-10-06'],
}).astype(dtype={'country': 'category', 'make': 'category', 'date': 'category'})

print data.query('make in @cars[country]')

I want to only include rows for which one field (make) is present in the set found in a dict lookup based on another field (@cars[country]).

Expected

  country        date   make
0   Japan  2018-10-04  Honda
1     USA  2018-10-06   Ford

Actual

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Anyone have any idea how to make this work? Or do I need to resort to using apply for this (which I understand is slower)?

If this represents your actual problem and those sets are disjoint like in the example (Honda can only appear in one country), I'd say put all of them into a single set and use `data[data['make'].isin(single_set)]`. — ayhan, Oct 08 '18 at 20:10
@ayhan Do you mean like `data[data['make'].isin({'Honda', 'Nissan', 'Toyota', 'Ford', 'Chevrolet', 'Tesla'})]`? That prints out Japan/Honda, Japan/Ford, USA/Ford. I just want it to print out Japan/Honda, USA/Ford. — Jonathan Aquino, Oct 08 '18 at 21:31

score 1 · Accepted Answer · answered Oct 08 '18 at 19:57

First, I'm not sure how to do dictionary lookup without using apply (which is potentially slow since it does not take advantage of vectorization).

However, regarding your question

do I need to resort to using apply for this

There are definitely better options than apply. For example, This post suggested using pandas.merge which is faster than apply.

In [29]: carsdf = pd.DataFrame({'country':["Japan"]*3+["USA"]*3, 'make':['Nissan','Honda','Toyota','Tesla','Chevolet','Ford']})

In [30]: pd.merge(data, carsdf, how='inner')
Out[28]: Out[30]: 
  country        date   make
0   Japan  2018-10-04  Honda
1     USA  2018-10-06   Ford

Instead of creating the cars `DataFrame` manually, you could do `pd.DataFrame.from_dict(cars, orient='index').stack().reset_index(level=1, drop=True)` to get all of the combinations. Just a bit more renaming to do after that. — ALollz, Oct 08 '18 at 20:20

Pandas: In query(), how to lookup column value as a dict key?

Expected

Actual

1 Answers1