Finding the intersection between two series in Pandas using index

Question

I have two series of different lengths, and I am attempting to find the intersection of the two series based on the index, where the index is a string. The end result is, hopefully, a series that has the elements of the intersection based on the common string indexes.

Any ideas?

Alex Riley · Accepted Answer · 2018-06-02T09:16:14.510

13

Pandas indexes have an intersection method which you can use. If you have two Series, s1 and s2, then

s1.index.intersection(s2.index)

or, equivalently:

s1.index & s2.index

gives you the index values which are in both s1 and s2.

You can then use this list of indexes to view the corresponding elements of a series. For example:

>>> ixs = s1.index.intersection(s2.index)
>>> s1.loc[ixs]
# subset of s1 with only the indexes also found in s2 appears here

edited Jun 02 '18 at 09:16

answered Oct 12 '14 at 15:10

Alex Riley

169,130
45
262
238

Nice, but it only gives me the indexes, not a series with the values. – Boss1295 Oct 12 '14 at 15:13
I tried the function, assigning it to a variable, but it gave me nothing. i am sure that there are common elements, but so far no go. – Boss1295 Oct 12 '14 at 15:19
Tried common = s1.index.intersection(s2.index); printing len(common) gives 0. – Boss1295 Oct 12 '14 at 15:21
1

@Boss1295 Looks like you don't have any common index values in your two Series. If you try with, say, `s1 = pd.Series(range(3))` and `s2 = pd.Series(range(3), index=[5, 0, 2])` you should see the method working as expected. – Alex Riley Oct 12 '14 at 15:24
The indexes are string's, does it still work under those conditions? – Boss1295 Oct 12 '14 at 15:28
ajcr's method should've worked, if it doesn't then you don't have common values – EdChum Oct 12 '14 at 15:31
Thanks guys, I'm going to look through and make sure that there are no common elements (though there are ~450 elements), but I am guessing that there are no common elements. – Boss1295 Oct 12 '14 at 15:38
2

s1.index & s2.index does the similar job. – seongjoo Sep 17 '15 at 00:16
You might need to use `iloc` (i.e. `s1.iloc[ixs]`) if the index type is an int (for anyone who runs into the same problem I did) – Ryan Biwer May 31 '18 at 23:25

score 0 · Answer 2 · answered Jan 16 '18 at 11:37

Both my data increments so I wrote a function to get the indices then filtered the data based on their indexes.

np.shape(data1)  # (1330, 8)
np.shape(data2)  # (2490, 9)
index_1, index_2 = overlap(data1, data2)
data1 = data1[index1]
data2 = data2[index2]
np.shape(data1)  # (540, 8)
np.shape(data2)  # (540, 9)
def overlap(data1, data2):
    '''both data is assumed to be incrementing'''
    mask1 = np.array([False] * len(data1))
    mask2 = np.array([False] * len(data2))
    idx_1 = 0
    idx_2 = 0
    while idx_1 < len(data1) and idx_2 < len(data2):
        if data1[idx_1] < data2[idx_2]:
            mask1[idx_1] = False
            mask2[idx_2] = False
            idx_1 += 1
        elif data1[idx_1] > data2[idx_2]:
            mask1[idx_1] = False
            mask2[idx_2] = False
            idx_2 += 1
        else:
            mask1[idx_1] = True
            mask2[idx_2] = True
            idx_1 += 1
            idx_2 += 1
    return mask1, mask2

Finding the intersection between two series in Pandas using index

2 Answers2

Linked