7

The example code below is from the book Advanced Analytics with Spark. When I load it into spark-shell (version 1.4.1) it gives the following error, indicating that it can't find StatCounter:

import org.apache.spark.util.StatCounter
<console>:9: error: not found: type StatCounter
        val stats: StatCounter = new StatCounter()
                   ^
<console>:9: error: not found: type StatCounter
        val stats: StatCounter = new StatCounter()
                                     ^
<console>:23: error: not found: type NAStatCounter
        def apply(x: Double) = new NAStatCounter().add(x)

If I just do the following in spark-shell there is no problem:

scala> import org.apache.spark.util.StatCounter
import org.apache.spark.util.StatCounter

scala> val statsCounter: StatCounter = new StatCounter()
statsCounter: org.apache.spark.util.StatCounter = (count: 0, mean: 0.000000, stdev: NaN, max: -Infinity, min: Infinity)

The problem seems to be with the :load command in spark-shell.

Here's the code:

import org.apache.spark.util.StatCounter
class NAStatCounter extends Serializable {
    val stats: StatCounter = new StatCounter()
    var missing: Long = 0

    def add(x: Double): NAStatCounter = {
        if (java.lang.Double.isNaN(x)) {
            missing += 1
        } else {
        stats.merge(x)
        }
        this
    }

    def merge(other: NAStatCounter): NAStatCounter = {
        stats.merge(other.stats)
        missing += other.missing
        this
    }

    override def toString = {
        "stats: " + stats.toString + " NaN: " + missing
    }
}

object NAStatCounter extends Serializable {
    def apply(x: Double) = new NAStatCounter().add(x)
}
sgvd
  • 3,819
  • 18
  • 31
Dean Schulze
  • 9,633
  • 24
  • 100
  • 165
  • Is that library in the class path? Can you tell us the location of that library and print out your lib path? – Dr.Knowitall Jan 24 '16 at 02:57
  • 4
    I found that I have to fully qualify StatCounter when declaring it, even though I imported it: `val stats: org.apache.spark.util.StatCounter = new org.apache.spark.util.StatCounter()` – Dean Schulze Jan 24 '16 at 03:01
  • It's in the classpath by default. The two line example from the spark-shell in the middle code block above shows that. When I load a file is when the problem occurs. – Dean Schulze Jan 24 '16 at 03:03
  • Other than having the wrong scala version, which has caused problems from time to time, I can't say whats causing it – Dr.Knowitall Jan 24 '16 at 08:08
  • 1
    When I copy and save that code to a file, I get the following in Spark 1.4.1 shell: scala> :load /tmp/test.scala Loading /tmp/test.scala... import org.apache.spark.util.StatCounter defined class NAStatCounter defined module NAStatCounter warning: previously defined class NAStatCounter is not a companion to object NAStatCounter. Companions must be defined together; you may wish to use :paste mode for this. – sgvd Jan 24 '16 at 18:08

1 Answers1

3

I have the exactly same problem with you.
I solve it as you tried,
CHANGE

val stats: StatCounter = new StatCounter() 

INTO

val stats: org.apache.spark.util.StatCounter = new org.apache.spark.util.StatCounter()  

the reason perhaps is the system don't know the path of StatCounter

ɐlǝx
  • 1,384
  • 2
  • 17
  • 22
Song
  • 351
  • 3
  • 7