0

I am using pyspark to find all possible pairs from an RDD of int array.

Input:

[[0, 1, 2],
[3, 4]]

Output RDD key value pairs of all possible combinations:

(0,1), (0,2), (1,0), ....  (3,4), (4,3)

I want to implement it in python not scala

Hokins
  • 1
  • 2
  • Does this answer your question? [Spark: produce RDD\[(X, X)\] of all possible combinations from RDD\[X\]](https://stackoverflow.com/questions/26557873/spark-produce-rddx-x-of-all-possible-combinations-from-rddx) – Federico Baù Dec 28 '20 at 14:35
  • Could you tell us what have you tried? also here is an answer in python: https://stackoverflow.com/a/43703395/1423901 – Redithion Dec 28 '20 at 14:36
  • Most of the solutions are for scala but i want to implement it in python – Hokins Dec 28 '20 at 15:25
  • This is the solution but it is implemented in scala. https://stackoverflow.com/questions/32837588/spark-computation-for-suggesting-new-friendships – Hokins Dec 28 '20 at 15:29

1 Answers1

0

permutations can be used to used to generate tuples of length 2.

from itertools import permutations

rdd = spark.sparkContext.parallelize([[0, 1, 2],[3, 4]])

rdd.flatMap(lambda x: permutations(x,2)).collect()

Output

[(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1), (3, 4), (4, 3)]
Nithish
  • 3,062
  • 2
  • 8
  • 16