0

In my Pandas query(), I want to filter based on whether one field is in a set found in a dict lookup on another field. Here is some stripped down code illustrating my problem:

import pandas

cars = {
    'Japan': {'Honda', 'Nissan', 'Toyota'},
    'USA': {'Ford', 'Chevrolet', 'Tesla'},
}

data = pandas.DataFrame({
    'country': ['Japan', 'Japan', 'USA'],
    'make': ['Honda', 'Ford', 'Ford'],
    'date': ['2018-10-04', '2018-10-05', '2018-10-06'],
}).astype(dtype={'country': 'category', 'make': 'category', 'date': 'category'})

print data.query('make in @cars[country]')

I want to only include rows for which one field (make) is present in the set found in a dict lookup based on another field (@cars[country]).

Expected

  country        date   make
0   Japan  2018-10-04  Honda
1     USA  2018-10-06   Ford

Actual

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Anyone have any idea how to make this work? Or do I need to resort to using apply for this (which I understand is slower)?

Jonathan Aquino
  • 343
  • 2
  • 8
  • If this represents your actual problem and those sets are disjoint like in the example (Honda can only appear in one country), I'd say put all of them into a single set and use `data[data['make'].isin(single_set)]`. – ayhan Oct 08 '18 at 20:10
  • @ayhan Do you mean like `data[data['make'].isin({'Honda', 'Nissan', 'Toyota', 'Ford', 'Chevrolet', 'Tesla'})]`? That prints out Japan/Honda, Japan/Ford, USA/Ford. I just want it to print out Japan/Honda, USA/Ford. – Jonathan Aquino Oct 08 '18 at 21:31
  • Sorry I somehow managed to miss that. – ayhan Oct 09 '18 at 03:03

1 Answers1

1

First, I'm not sure how to do dictionary lookup without using apply (which is potentially slow since it does not take advantage of vectorization).

However, regarding your question

do I need to resort to using apply for this

There are definitely better options than apply. For example, This post suggested using pandas.merge which is faster than apply.

In [29]: carsdf = pd.DataFrame({'country':["Japan"]*3+["USA"]*3, 'make':['Nissan','Honda','Toyota','Tesla','Chevolet','Ford']})

In [30]: pd.merge(data, carsdf, how='inner')
Out[28]: Out[30]: 
  country        date   make
0   Japan  2018-10-04  Honda
1     USA  2018-10-06   Ford
clony
  • 136
  • 2
  • Instead of creating the cars `DataFrame` manually, you could do `pd.DataFrame.from_dict(cars, orient='index').stack().reset_index(level=1, drop=True)` to get all of the combinations. Just a bit more renaming to do after that. – ALollz Oct 08 '18 at 20:20