Finding intersection of values in a column associated with unique values in another column Pandas

Question

If I have a DataFrame like this (very minimal example)

  col1  col2
0    a     1
1    a     2
2    b     1
3    b     2
4    b     4
5    c     1
6    c     2
7    c     3

and I want the intersection of all col2 values when they are related to their unique col1 values (so in this case, the intersection would be [1,2]), how can I do so with Pandas? Another way to word this would be the values in col2 that exist for every unique value in col1.

My (bad) solution was to get the unique col1 elements with unique, and then build dictionaries from each unique element in col1 and then take the set intersection of those dictionary values. I feel like there is a mechanism I should be using to relate the columns together however that could make this much easier.

score 2 · Accepted Answer · answered May 13 '16 at 07:04

2

One way is to use pivot_table:

In [11]: cross = df.pivot_table(index="col1", columns="col2", aggfunc='size') == 1

In [12]: cross
Out[12]:
col2     1     2      3      4
col1
a     True  True  False  False
b     True  True  False   True
c     True  True   True  False

In [13]: cross.all()
Out[13]:
col2
1     True
2     True
3    False
4    False
dtype: bool

In [14]: cross.columns[cross.all()]
Out[14]: Int64Index([1, 2], dtype='int64', name='col2')

answered May 13 '16 at 07:04

Andy Hayden

359,921
101
625
535

1

And now I'll end up spending some time being fascinated `pivot_table`. Many thanks. – Eric Hansen May 13 '16 at 07:11

jezrael · Answer 2 · 2016-05-13T07:28:08.733

1

Another solution:

print df.pivot_table(index="col1", columns="col2", aggfunc=len)
col2    1    2    3    4
col1                    
a     1.0  1.0  NaN  NaN
b     1.0  1.0  NaN  1.0
c     1.0  1.0  1.0  NaN

ser = (df.pivot_table(index="col1", columns="col2", aggfunc=len) == 1).all()
print  ser.index[ser]
Int64Index([1, 2], dtype='int64', name=u'col2')

edited May 13 '16 at 07:28

answered May 13 '16 at 07:08

jezrael

822,522
95
1,334
1,252

Finding intersection of values in a column associated with unique values in another column Pandas

2 Answers2

Linked