Create a boolean mask of the values will index the rows where the 2 df's match, no need to iterate and much faster.
Example:
# define a list of values
a = list('abcdef')
b = range(6)
df = pd.DataFrame({'X':pd.Series(a),'Y': pd.Series(b)})
# c has x values for 'a' and 'd' so these should not match
c = list('xbcxef')
df1 = pd.DataFrame({'X':pd.Series(c),'Y': pd.Series(b)})
print(df)
print(df1)
X Y
0 a 0
1 b 1
2 c 2
3 d 3
4 e 4
5 f 5
[6 rows x 2 columns]
X Y
0 x 0
1 b 1
2 c 2
3 x 3
4 e 4
5 f 5
[6 rows x 2 columns]
In [4]:
# now index your df using boolean condition on the values
df[df.X == df1.X]
Out[4]:
X Y
1 b 1
2 c 2
4 e 4
5 f 5
[4 rows x 2 columns]
EDIT:
So if you have different length series then that won't work, in which case you can use isin
:
So create 2 dataframes of different lengths:
a = list('abcdef')
b = range(6)
d = range(10)
df = pd.DataFrame({'X':pd.Series(a),'Y': pd.Series(b)})
c = list('xbcxefxghi')
df1 = pd.DataFrame({'X':pd.Series(c),'Y': pd.Series(d)})
print(df)
print(df1)
X Y
0 a 0
1 b 1
2 c 2
3 d 3
4 e 4
5 f 5
[6 rows x 2 columns]
X Y
0 x 0
1 b 1
2 c 2
3 x 3
4 e 4
5 f 5
6 x 6
7 g 7
8 h 8
9 i 9
[10 rows x 2 columns]
Now use isin
to select rows from df1 where the id's exist in df:
In [7]:
df1[df1.X.isin(df.X)]
Out[7]:
X Y
1 b 1
2 c 2
4 e 4
5 f 5
[4 rows x 2 columns]