Furthering the question from this post here, I have two unordered lists and would like to find if they are equal taking duplicates into account and not caring about the order. And if they are not equal, to find which elements from which list is not in the other one.
Taking the example from the post mentioned above, assuming the list on the left of the equals signs is L1 and list on the right is L2,
L1 L2
['one', 'two', 'three'] == ['one', 'two', 'three'] : true
['one', 'two', 'three'] == ['one', 'three', 'two'] : true
['one', 'two', 'three'] == ['one', 'two', 'three', 'three'] : false, L1:'three'
['one', 'two', 'three'] == ['one', 'two', 'three', 'four'] : false, L1:'four'
['one', 'two', 'three'] == ['one', 'two', 'four'] : false, L1:'four', L2:'three'
['one', 'two', 'three'] == ['one'] : false, L2:'two','three'
The output does not have to be exactly like I have depicted, but basically I would like to know if the comparison of two lists are true or false, and if false which elements in L2 are not in L1 and which elements in L1 are not in L2.
The solution provided by @Katriel was to use the collections
function like so:
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
But it doesn't provide information on which elements are mismatched. Is there an efficient method to do this in pyspark?