I have a list of lists in RDD and a list to intersect. B needs to interact with every list in A.
A = [[a,b,c,d],[e,f,g,h]....]
B = [a,b,c,d,e,f,g,h]
I need to intersect these two to get the common letters. I used the following but got error due to typeError
pwords = A.intersection(B)
I then tried to use parallelize based on few suggestions on stackoverflow but got an error.
text_words = sc.parallelize(A)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark/python/pyspark/context.py", line 501, in
parallelize
c = list(c) # Make it a list so we can compute its length
TypeError: 'PipelinedRDD' object is not iterable
When I tried to convert into list as shown in the error message. I again got an error.
TypeError: 'PipelinedRDD' object is not iterable
I tried to follow Find intersection of two nested lists? and got this error:
TypeError: 'PipelinedRDD' object is not iterable