2

I have a pandas DataFrame df with a list of unique ids id, and a DataFrame with master list of all known ids master_df.id. I'm trying to figure out the best way to preform an isin that also returns to me the index where the value is located. So if my DataFrame was

master_df was

index id
1     1
2     2
3     3

and df was

index id
1     3
2     4
3     1

I want something like (3, False, 1). I'm currently doing an is in and then looking then brute forcing the lookup with a loop, but I'm sure there is a much better way to do it.

TristanMatthews
  • 2,451
  • 4
  • 24
  • 34
  • Is the index of `master_df` meaningful? If you can make `master_df` have the id as its index, then you can just directly look the values up in `master_df`. – BrenBarn Nov 16 '13 at 07:00

1 Answers1

3

One way is to do a merge:

In [11]: df.merge(mdf, on='id', how='left')
Out[11]: 
   index_x  id  index_y
0        1   3        3
1        2   4      NaN
2        3   1        1

and column index_y is the desired result*:

In [12]: df.merge(mdf, on='id', how='left').index_y
Out[12]: 
0     3
1   NaN
2     1
Name: index_y, dtype: float64

* Except for NaN vs. False, but I think NaN is what you really want here. As @DSM points out, in python False == 0 so you may get into trouble with False as the representative for missing vs being found with id 0. (If you still want to do it then replace the NaN with 0 using .fillna(0)).

Note: it's possible it will be more efficient to just take the columns you care about:

df[['id']].merge(mdf[['id', 'index']], on='id', how='left')
Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 2
    One point you didn't mention is that `False == 0`, which can easily get you into trouble. `NaN` avoids this issue. – DSM Nov 16 '13 at 13:11
  • @DSM Excellent point, I realised this when I tried fillna, not sure why I neglected to mention it! – Andy Hayden Nov 16 '13 at 19:03