1

Is there a way to use flatMap to flatten a list in an rdd like so:

rdd = sc.parallelize([[1,2,3],[6,7,8]])

rdd.flatMap(lambda r: [[r[0],r[1],r[2],[r[2]+1,r[2]+2]]]).collect()

My desired output:

[[1,2,3,4,5],[6,7,8,9,10]]

The actual output:

[[1,2,3,[4,5]], [6,7,8,[9,10]]]

I understand flatMap flattens the array appropriately, and I am not confused as to the actual output above, but I would like to know if there is a way to effectively flatten the inner list.

Water Crane
  • 43
  • 1
  • 9

1 Answers1

2

Please modify your code like below to get the desired output

rdd.flatMap(lambda r: [[r[0],r[1],r[2],r[2]+1,r[2]+2]]).collect()
Mohan
  • 867
  • 2
  • 7
  • 25
  • The point is that I can't modify it that way. I have a structure as above, and I'd like to flatten it without calling a list comprehension. – Water Crane Apr 15 '16 at 16:42
  • If the given answer is not what your expected, can you please edit and improve your question? It is quite unclear what you intend to do – Mohan Apr 17 '16 at 05:15
  • I'd like to flatten the list as it is posed in the question. Given a list that looks like `[1,2,3,[4,5]]`, I'd like to flatten it to `[1,2,3,4,5]` with the tools available in pyspark. – Water Crane Apr 17 '16 at 06:15
  • @WaterCrane does this answers your question ? – eliasah Apr 21 '16 at 05:29
  • @eliasah No. The question is not about reformatting the expression, but rather creating a solution to flatten non homogeneous objects. – Water Crane Apr 25 '16 at 02:42