0

I need to compare each value of a list to each value of a df column, and if there is a match take the value of another column.

I have a couple of loops working with iterrows but the code is taking a long time to run. I was wondering if there is a more efficient way to do this? It seems .loc might be a good answer but the docs aren't super clear on how to make it work for this usecase.

My code so far is

listy = []
for view in joined_views:
    for row in df.iterrows():
        if view == row[1]['other_view']:
            listy.append(row[1]['other_column']
hselbie
  • 1,749
  • 9
  • 24
  • 40
  • 1
    [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – It_is_Chris Jul 07 '21 at 20:26
  • Make your list a `pd.Series`. Merge it on your column while having the other column attached. – ifly6 Jul 07 '21 at 20:28
  • 1
    Notice from my answer that reproducible pandas examples don't need to be difficult. – tdelaney Jul 07 '21 at 20:53
  • It will be great if you can have a look at [how-to-ask](/help/how-to-ask) and then try to produce a [mcve](/help/mcve). – rpanai Jul 07 '21 at 21:00
  • The best way is to avoid loops. Eventually you can use `map` or `apply`. – rpanai Jul 07 '21 at 21:01

1 Answers1

2

Pandas is built to apply operations across a group of data. iterrows is a relatively slow process to use when a group operation isn't available. In your case, isin will select the rows you want, and then you can grab the other column.

This can be written as

import pandas as pd
df = pd.DataFrame({"other_view":[1,2,3,4,5], 
    "other_column":["a", "b", "c", "d", "e"]})
joined_views = [1, 4, 100, 900, 1000]
listy = df[df.other_view.isin(joined_viewss)].other_column
print(listy)

or, if you prefer to name the columns as strings

df[df["other_view"].isin(joined_views)]["other_column"]

In words, select df rows where other_view is in joined_views, then take the other_column values.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • it perhaps depends on what they'll do with `listy` afterwards but using `loc` as `df.loc[...isin(...), "other_column"]` [might be better](https://pandas.pydata.org/docs/user_guide/indexing.html#why-does-assignment-fail-when-using-chained-indexing) i guess. @hselbie – Mustafa Aydın Jul 09 '21 at 06:00