0

I'm encountering a spark job quitting with error message as empty collection.

java.lang.UnsupportedOperationException: empty collection

I have zoomed into 2 lines that caused the issue.

sum_attribute1 = inputRDD.map(_.attribute1).reduce(_+_)
sum_attribute2 = inputRDD.map(_.attribute2).reduce(_+_)`

Other lines that does .map and .distinct.count is fine. I like to print out inputRDD.map(attribute1) and inputRDD.map(_.attribute2) to see what was map before the reduce.

I thought I could define something like

sum_attribute1 = inputRDD.map(_.attribute1)

but when I tried to compile the code, it shows errors:

[error]  found   : org.apache.spark.rdd.RDD[Int]
[error]  required: Long
[error] sum_attribute1 = inputRDD.map(_.attribute1)
[error]                              ^

My attribute1 was defined as Int but when I tried to define it as Long, it gave me another error.

Am I going in the right direction? How can I print the data after map and before reduce? What could be the possible issue with empty collection? What does the underscore in _.attribute1 and reduce(_+_) mean?

user1342124
  • 601
  • 1
  • 7
  • 15
  • [What are all the uses of the underscore?](https://stackoverflow.com/questions/8000903/what-are-all-the-uses-of-an-underscore-in-scala) Look for "placeholder syntax". – jwvh Apr 11 '18 at 10:21

1 Answers1

1

I don't think that you are going in the right direction, I would focus on the elements below:

I recommend that you learn a bit of scala first. To one of your specific question read about that usage of _.

To another of your question, reduce cannot be used on empty collection, I recommend using fold instead as it supports empty collections just fine.

Frederic A.
  • 3,504
  • 10
  • 17
  • Hi, actually I am porting some spark job from oracle BDA to another platform which is using v3io. On the BDA, the same job is using reduce without empty collection issue. So I wanted to see the result of the map before reduce on both sides for comparison. Need to confirm this before I switch to using fold on my new platform. – user1342124 Apr 12 '18 at 02:01