You could use pandas apply
to obtain the intersection between the reference set and the set on each row of the dataframe. Then, apply the len
function over the newly created column (Intersect
) to quantify each intersection. This will enable you to filter the dataframe and only show rows where the set in sets
intersects with another set (df['Len'] > 0
).
df used as input
id_one id_two sets
0 7575 527 {1, 4, 6, 7, 8, 13, 16, 20, 24, 31, 40, 47, 50, 52, 57, 61, 64, 69, 80, 88, 91, 96}
1 574 1555 {7, 18, 19, 22, 23, 24, 30, 39, 43, 47, 50, 58, 62, 64, 72, 76, 77, 83, 84, 86, 87, 96}
2 7831 8823 {5, 14, 15, 20, 23, 28, 30, 32, 35, 36, 40, 41, 44, 52, 54, 59, 60, 62, 63, 84, 87, 90, 96}
3 688 6860 {2, 9, 20, 24, 27, 28, 30, 38, 46, 57, 59, 60, 64, 65, 69, 71, 80, 84, 88, 91, 95}
4 8843 596 {6, 7, 8, 24, 25, 27, 30, 33, 47, 50, 54, 56, 57, 61, 64, 66, 69, 74, 78, 81, 85, 88, 99}
5 1269 7546 {11, 22, 24, 25, 33, 35, 45, 48, 49, 54, 57, 59, 61, 68, 70, 75, 86, 87, 94, 95}
6 1362 4860 {2, 5, 14, 19, 23, 32, 37, 38, 47, 48, 58, 62, 65, 68, 70, 72, 73, 77, 82, 88, 91}
7 7994 7192 {2, 3, 4, 7, 9, 11, 12, 13, 15, 17, 20, 24, 25, 29, 40, 50, 57, 64, 71, 78, 89, 95, 99}
8 748 6271 {1, 12, 19, 26, 30, 34, 45, 48, 52, 60, 67, 72, 73, 74, 76, 80, 82, 84, 89, 94, 96, 98, 99}
9 553 4068 {9, 12, 15, 20, 35, 39, 40, 41, 44, 45, 50, 57, 65, 67, 68, 69, 72, 73, 79, 87, 88, 97, 98}
REFERENCE_SET = {10, 18, 23, 55, 90, 92}
df['Intersect'] = df['sets'].apply(lambda row: REFERENCE_SET.intersection(row))
df['Len'] = df['Intersect'].apply(len)
df_filtered = df[df['Len'] > 0]
Output from df_filtered
id_one id_two sets Intersect Len
1 574 1555 {7, 18, 19, 22, 23, 24, 30, 39, 43, 47, 50, 58, 62, 64, 72, 76, 77, 83, 84, 86, 87, 96} {18, 23} 2
2 7831 8823 {5, 14, 15, 20, 23, 28, 30, 32, 35, 36, 40, 41, 44, 52, 54, 59, 60, 62, 63, 84, 87, 90, 96} {90, 23} 2
6 1362 4860 {2, 5, 14, 19, 23, 32, 37, 38, 47, 48, 58, 62, 65, 68, 70, 72, 73, 77, 82, 88, 91} {23} 1