PySpark flatMap to Flatten List in a List

Question

Is there a way to use flatMap to flatten a list in an rdd like so:

rdd = sc.parallelize([[1,2,3],[6,7,8]])

rdd.flatMap(lambda r: [[r[0],r[1],r[2],[r[2]+1,r[2]+2]]]).collect()

My desired output:

[[1,2,3,4,5],[6,7,8,9,10]]

The actual output:

[[1,2,3,[4,5]], [6,7,8,[9,10]]]

I understand flatMap flattens the array appropriately, and I am not confused as to the actual output above, but I would like to know if there is a way to effectively flatten the inner list.

score 2 · Answer 1 · answered Apr 15 '16 at 09:23

2

Please modify your code like below to get the desired output

rdd.flatMap(lambda r: [[r[0],r[1],r[2],r[2]+1,r[2]+2]]).collect()

answered Apr 15 '16 at 09:23

Mohan

867
2
7
25

The point is that I can't modify it that way. I have a structure as above, and I'd like to flatten it without calling a list comprehension. – Water Crane Apr 15 '16 at 16:42
If the given answer is not what your expected, can you please edit and improve your question? It is quite unclear what you intend to do – Mohan Apr 17 '16 at 05:15
I'd like to flatten the list as it is posed in the question. Given a list that looks like `[1,2,3,[4,5]]`, I'd like to flatten it to `[1,2,3,4,5]` with the tools available in pyspark. – Water Crane Apr 17 '16 at 06:15
@WaterCrane does this answers your question ? – eliasah Apr 21 '16 at 05:29
@eliasah No. The question is not about reformatting the expression, but rather creating a solution to flatten non homogeneous objects. – Water Crane Apr 25 '16 at 02:42

PySpark flatMap to Flatten List in a List

1 Answers1