0

I have a DataFrame with two columns store_id and product_store_id whose values I need to check against a list of tuples

products_list = [('ebay','123'),('amazon','789'),..] 

efficiently, and select the rows containing just the rows of products described in that list?

I've tried products.loc[products[['store_id','product_store_id']].isin(products_list)] but pandas doesn't like that (ValueError: Cannot index with multidimensional key)

How do i efficiently select all of the houses where store_id and product_store_id are in the list?

cs95
  • 379,657
  • 97
  • 704
  • 746
yoni keren
  • 300
  • 2
  • 14
  • Seems like a prime candidate for a [`MultiIndex`](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-indexing-with-hierarchical-index). Can you provide a [mcve]. Though my guess is all of your answers can be found in https://stackoverflow.com/questions/53927460/select-rows-in-pandas-multiindex-dataframe – ALollz Jun 24 '19 at 16:22
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [Minimal, complete, verifiable example](https://stackoverflow.com/help/minimal-reproducible-example) applies here. We cannot effectively help you until you post your MCVE code and accurately specify the problem. We should be able to paste your posted code into a text file and reproduce the problem you specified. StackOverflow is not a design, coding, research, or tutorial resource. – Prune Jun 24 '19 at 16:25
  • I am not sure about efficiency, but is concatenating the `store_id` and `product_store_id` an option? Then you can easily use `.isin(products_list)` – KenHBS Jun 24 '19 at 16:26
  • @Prune I've provided a minimal example. Is anything not crystal clear now? – yoni keren Jun 24 '19 at 16:36
  • @KenHBS by concatenating you mean creating a new column of something like 'store@product_store_id'? – yoni keren Jun 24 '19 at 16:38

1 Answers1

1

There are ways to do this, some more hacky than others. My recommendation is to generate a MultiIndex, these work nicely with a list of tuples:

# <=0.23
idx = pd.MultiIndex.from_arrays([
    products['store_id'], products['product_store_id']])
# 0.24+
idx = pd.MultiIndex.from_frame(products[['store_id', 'product_store_id']])

products.loc[idx.isin(products_list)]

Another option is concatenation and filtering,

products_list_concat = [''.join(l) for l in products_list]
mask = ((products['store_id'] + products['product_store_id'])
           .isin(products_list_concat))

products.loc[mask]
cs95
  • 379,657
  • 97
  • 704
  • 746