I have two data frames
first dataframe:
+---+---+----------------+
| p| o| collect_list(s)|
+---+---+----------------+
| T| V2|[c1, c5] |
| T| V1|[c2, c3, c4, c6]|
+---+---+----------------+
second dataframe:
+---+---+--------------------+
| p| o| collect_list(s)|
+---+---+--------------------+
| A| V3|[c1, c2, c3, c4] |
| B| V3|[c1, c2, c3, c5, c6]|
+---+---+--------------------+
How can we do intersect operation between above dataframes based on collect_list
column?
The result should be another dataframe that join between items if the length of intersect operation greater than minimum support 2
as following:
+----------------------------+
| 2-Itemset |TID |
+----------------------------+
|[(T,V2),(B,V3)]|[c1, c5] |
|[(T,V1),(A,V3)]|[c2, c3,c4] |
|[(T,V1),(B,V3)]|[c2,c3,c6] |
+----------------------------+