How can I get intersection of two pandas series text column?

Question

I have two pandas series of text column how can I get intersection of those?

print(df)

0  {this, is, good}
1  {this, is, not, good}

print(df1)

0  {this, is}
1  {good, bad}

I'm looking for a output something like below.

print(df2)

0  {this, is}
1  {good}

I've tried this but it returns

df.apply(lambda x: x.intersection(df1))
TypeError: unhashable type: 'set'

I have tried to answer this in my own way by keeping your points that you mentioned in the problem. `df` & `df2` are DataFrames I guess based on names but answers are in terms of Series. So I thought to answer the using DataFrame by using `intersection()` too. — hygull, Mar 26 '19 at 18:19

score 1 · Answer 1 · answered Mar 26 '19 at 17:39

1

This approach works for me

import pandas as pd
import numpy as np

data = np.array([{'this', 'is', 'good'},{'this', 'is', 'not', 'good'}])
data1 = np.array([{'this', 'is'},{'good', 'bad'}])
df = pd.Series(data)
df1 = pd.Series(data1)

df2 = pd.Series([df[i] & df1[i] for i in xrange(df.size)])
print(df2)

answered Mar 26 '19 at 17:39

VietHTran

2,233
2
9
16

1

It's working just fine but it's slow on my huge dataset. Thank you – Jeeth Mar 26 '19 at 17:46

score 1 · Accepted Answer · answered Mar 26 '19 at 17:45

1

Looks like a simple logic:

s1 = pd.Series([{'this', 'is', 'good'}, {'this', 'is', 'not', 'good'}])
s2 = pd.Series([{'this', 'is'}, {'good', 'bad'}])
s1 - (s1 - s2)  
#Out[122]: 
#0    {this, is}
#1        {good}
#dtype: object

answered Mar 26 '19 at 17:45

jxc

13,553
4
16
34

Thank you. How can I also do a union with the above approach? – Jeeth Mar 26 '19 at 17:57

hygull · Answer 3 · 2019-03-26T18:23:20.950

I appreciate above answers. Here is a simple example to solve the same if you have DataFrame (As I guess, after looking into your variable names like df & df1, you had asked this for DataFrame .).

This df.apply(lambda row: row[0].intersection(df1.loc[row.name][0]), axis=1) will do that. Let's see how I reached to the solution.

The answer at https://stackoverflow.com/questions/266582... was helpful for me.

>>> import pandas as pd

>>> 
>>> df = pd.DataFrame({
...     "set": [{"this", "is", "good"}, {"this", "is", "not", "good"}]
... })
>>> 
>>> df
                     set
0       {this, is, good}
1  {not, this, is, good}
>>> 
>>> df1 = pd.DataFrame({
...     "set": [{"this", "is"}, {"good", "bad"}]
... })
>>> 
>>> df1
           set
0   {this, is}
1  {bad, good}
>>>
>>> df.apply(lambda row: row[0].intersection(df1.loc[row.name][0]), axis=1)
0    {this, is}
1        {good}
dtype: object
>>>

How I reached to the above solution?

>>> df.apply(lambda x: print(x.name), axis=1)
0
1
0    None
1    None
dtype: object
>>> 
>>> df.loc[0]
set    {this, is, good}
Name: 0, dtype: object
>>> 
>>> df.apply(lambda row: print(row[0]), axis=1)
{'this', 'is', 'good'}
{'not', 'this', 'is', 'good'}
0    None
1    None
dtype: object
>>> 
>>> df.apply(lambda row: print(type(row[0])), axis=1)
<class 'set'>
<class 'set'>
0    None
1    None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), df1.loc[row.name]), axis=1)
<class 'set'> set    {this, is}
Name: 0, dtype: object
<class 'set'> set    {good}
Name: 1, dtype: object
0    None
1    None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), type(df1.loc[row.name])), axis=1)
<class 'set'> <class 'pandas.core.series.Series'>
<class 'set'> <class 'pandas.core.series.Series'>
0    None
1    None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), type(df1.loc[row.name][0])), axis=1)
<class 'set'> <class 'set'>
<class 'set'> <class 'set'>
0    None
1    None
dtype: object
>>>

score 0 · Answer 4 · answered Mar 26 '19 at 17:53

Similar to above except if you want to keep everything in one dataframe

Current df:
df = pd.DataFrame({0: np.array([{'this', 'is', 'good'},{'this', 'is', 'not', 'good'}]), 1: np.array([{'this', 'is'},{'good', 'bad'}])})

Intersection of series 0 & 1
df[2] = df.apply(lambda x: x[0] & x[1], axis=1)

How can I get intersection of two pandas series text column?

4 Answers4

How I reached to the above solution?

Linked