1

In SO 33655920 I come across the below, fine.

rdd = sc.parallelize([1, 2, 3, 4], 2)
def f(iterator): yield sum(iterator)
rdd.mapPartitions(f).collect()

In Scala, I cannot seem to get the the def in the same shorthand way. The equivalent is? I have searched and tried but to no avail.

Thanks in advance.

Seth Tisue
  • 29,985
  • 11
  • 82
  • 149
thebluephantom
  • 16,458
  • 8
  • 40
  • 83

2 Answers2

2

yield sum(iterator) in Python sums the elements of the iterator. The similar way of doing this in Scala would be:

val rdd = sc.parallelize(Array(1, 2, 3, 4), 2)
rdd.mapPartitions(it => Iterator(it.sum)).collect()
Jiri Kremser
  • 12,471
  • 7
  • 45
  • 72
1

If you want to sum values in the partition you can write something like

val rdd = sc.parallelize(1 to 4, 2)
def f(i: Iterator[Int]) = Iterator(i.sum)
rdd.mapPartitions(f).collect()
addmeaning
  • 1,358
  • 1
  • 13
  • 36