I'm very new to both Scala and Spark, so please forgive me if I'm going about this completley wrong. After taking in a csv file, filtering, and mapping; I have an RDD that is a bunch of (String, Double) pairs.
(b2aff711,-0.00510)
(ae095138,0.20321)
(etc.)
When I use .groupByKey( ) on the RDD,
val grouped = rdd1.groupByKey()
to get a RDD with a bunch of (String, [Double]) pairs. (I don't know what CompactBuffer means, maybe could be causing my issue?)
(32540b03,CompactBuffer(-0.00699, 0.256023))
(a93dec11,CompactBuffer(0.00624))
(32cc6532,CompactBuffer(0.02337, -0.05223, -0.03591))
(etc.)
Once they are grouped I am trying to take the mean and the standard deviation. I want to simply use the .mean( ) and .sampleStdev( ). When I try to create a new RDD of the means,
val mean = grouped.mean()
an error is returned
Error:(51, 22) value mean is not a member of org.apache.spark.rdd.RDD[(String, Iterable[Double])]
val mean = grouped.mean( )
I have imported org.apache.spark.SparkContext._
I also tried using the sampleStdev( ), .sum( ), .stats( ) with the same results. Whatever the problem, it appears to be affecting all of the numeric RDD operations.