6

The following is simple code to get the word count over a window size of 30 seconds and slide size of 10 seconds.

import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.api.java.function._
import org.apache.spark.streaming.api._
import org.apache.spark.storage.StorageLevel

val ssc = new StreamingContext(sc, Seconds(5))

// read from text file
val lines0 = ssc.textFileStream("test")
val words0 = lines0.flatMap(_.split(" "))

// read from socket
val lines1 = ssc.socketTextStream("localhost", 9999, StorageLevel.MEMORY_AND_DISK_SER)
val words1 = lines1.flatMap(_.split(" "))

val words = words0.union(words1)
val wordCounts = words.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10))

wordCounts.print()
ssc.checkpoint(".")
ssc.start()
ssc.awaitTermination()

However, I am getting error from this line:

val wordCounts = words.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10))

. Especially, from _ + _. The error is

51: error: missing parameter type for expanded function ((x$2, x$3) => x$2.$plus(x$3))

Could anybody tell me what the problem is? Thanks!

merours
  • 4,076
  • 7
  • 37
  • 69
user2895478
  • 383
  • 4
  • 15

1 Answers1

10

This is extremely easy to fix, just be explicit about the types.
val wordCounts = words.map((_, 1)).reduceByKeyAndWindow((a:Int,b:Int)=>a+b, Seconds(30), Seconds(10))

The reason scala can't infer the type in this case is explained in this answer

Community
  • 1
  • 1
aaronman
  • 18,343
  • 7
  • 63
  • 78
  • Thank you! After the change, the program was giving the expected results but it also gave another error in the meantime: java.util.NoSuchElementException: key not found: 1406051860000 ms at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64) at org.apache.spark.streaming.dstream.ReceiverInputDStream.getReceivedBlockInfo(ReceiverInputDStream.scala:77) I wonder how did that happen? – user2895478 Jul 22 '14 at 18:00
  • @user2895478 I believe that is from this [Jira ticket](https://issues.apache.org/jira/browse/SPARK-2009) the problem is resolved in 1.0.1 and 1.1.0 – aaronman Jul 22 '14 at 18:54