How to get the common index of two pandas dataframes?

Question

I have two pandas DataFrames df1 and df2 and I want to transform them in order that they keep values only for the index that are common to the 2 dataframes.

df1

                      values 1
0                            
28/11/2000          -0.055276
29/11/2000           0.027427
30/11/2000           0.066009
01/12/2000           0.012749
04/12/2000           0.113892

df2

                       values 2

24/11/2000            -0.004808
27/11/2000            -0.001812
28/11/2000            -0.026316
29/11/2000             0.015222
30/11/2000            -0.024480

become

df1

                     value 1

28/11/2000          -0.055276
29/11/2000           0.027427
30/11/2000           0.066009

df2

                       value 2

28/11/2000            -0.026316
29/11/2000             0.015222
30/11/2000            -0.024480

jezrael · Answer 1 · 2019-08-29T11:31:44.853

38

You can use Index.intersection + DataFrame.loc:

idx = df1.index.intersection(df2.index)
print (idx)
Index(['28/11/2000', '29/11/2000', '30/11/2000'], dtype='object')

Alternative solution with numpy.intersect1d:

idx = np.intersect1d(df1.index, df2.index)
print (idx)
['28/11/2000' '29/11/2000' '30/11/2000']

df1 = df1.loc[idx]
print (df1)
            values 1
28/11/2000 -0.055276
29/11/2000  0.027427
30/11/2000  0.066009

df2 = df2.loc[idx]

edited Aug 29 '19 at 11:31

answered Jan 09 '18 at 14:57

jezrael

822,522
95
1,334
1,252

I was wondering if you could do this with a single level of a multilevel index? – Roy Han Apr 24 '19 at 21:17
Great answer. Only a small mistake: df2 = df1.loc[idx] should be df2 = df2.loc[idx] – jcdoming Aug 29 '19 at 11:30

score 8 · Answer 2 · answered Jan 09 '18 at 14:58

In [352]: common = df1.index.intersection(df2.index)

In [353]: df1.loc[common]
Out[353]:
             values1
0
28/11/2000 -0.055276
29/11/2000  0.027427
30/11/2000  0.066009

In [354]: df2.loc[common]
Out[354]:
             values2
0
28/11/2000 -0.026316
29/11/2000  0.015222
30/11/2000 -0.024480

score 8 · Answer 3 · answered Jan 09 '18 at 15:07

8

And, using isin. intersection might be faster though.

In [286]: df1.loc[df1.index.isin(df2.index)]
Out[286]:
             values1
0
28/11/2000 -0.055276
29/11/2000  0.027427
30/11/2000  0.066009

In [287]: df2.loc[df2.index.isin(df1.index)]
Out[287]:
             values2
0
28/11/2000 -0.026316
29/11/2000  0.015222
30/11/2000 -0.024480

answered Jan 09 '18 at 15:07

Zero

74,117
18
147
154

Just for everyone's reference, why `isin` is faster is explained in depth here - https://stackoverflow.com/a/50881584/1003798 . Pandas `isin` uses khash from `klib` algorithm, while numpy `in1d` or `intersect1d` make a call to the `unique` function which increases time over a longer range of dataset. https://github.com/attractivechaos/klib – godimedia Oct 22 '22 at 22:47

score 4 · Answer 4 · answered Jan 09 '18 at 15:16

reindex + dropna

df1.reindex(df2.index).dropna()
Out[21]: 
             values1
28/11/2000 -0.055276
29/11/2000  0.027427
30/11/2000  0.066009


df2.reindex(df1.index).dropna()
Out[22]: 
             values2
28/11/2000 -0.026316
29/11/2000  0.015222
30/11/2000 -0.024480

score 2 · Answer 5 · answered Jan 09 '18 at 14:57

2

Have you tried something like

df1 = df1.loc[[x for x in df1.index if x in df2.index]]
df2 = df2.loc[[x for x in df2.index if x in df1.index]]

answered Jan 09 '18 at 14:57

LangeHaare

2,776
2
17
25

score 2 · Answer 6 · answered Dec 08 '18 at 18:51

2

The index object has some set-like properties so you simply can take the intersection as follows:

df1 = df1.reindex[ df1.index & df2.index ]

This retains the order of the first dataframe in the intersection, df.

answered Dec 08 '18 at 18:51

iamjli

411
4
3

tarekcw · Answer 7 · 2019-11-15T04:37:19.470

You can pd.merge them with an intermediary DataFrame created with the indexes of the other DataFrame:

df2_indexes = pd.DataFrame(index=df2.index)
df1 = pd.merge(df1, df2_indexes, left_index=True, right_index=True)
df1_indexes = pd.DataFrame(index=df1.index)
df2 = pd.merge(df2, df1_indexes, left_index=True, right_index=True)

or you can use pd.eval:

df2_indexes =  df2.index.values
df1 = df1[eval("df1.index in df2_indexes"]
df1_indexes = df1.index.values
df2 = df2[eval("df2.index in df1_indexes"]

score 0 · Answer 8 · answered Dec 16 '20 at 14:21

0

I found pd.Index and set combination much faster than numpy.intersect1d as well df1.index.intersection(df2.index). Here is what I used:

df2 = df2.loc[pd.Index(set(df1.index)&set(df2.index))]

answered Dec 16 '20 at 14:21

Mridul Bhardwaj

41
4

score 0 · Answer 9 · answered Nov 27 '22 at 21:03

%timeit df1.index.intersection(df2.index)

66.5 µs ± 2.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


from numpy.lib.arraysetops import intersect1d
%timeit np.intersect1d(df1.index,df2.index)
83.1 µs ± 7.94 µs per loop (mean ± std. dev.

of 7 runs, 10000 loops each)

How to get the common index of two pandas dataframes?

9 Answers9

Linked