Quick abstract :
I am trying to display multiple histograms from Spark DataFrames with Vegas-viz in Scala. I created a trait
to create different types of histograms, and implemented classes expending it. When I create an instance of a child class, I get a NullPointerException
which makes me think there is a nested DataFrame somewhere.
Is there a workaround? Did I miss something and the error is something else?
Details :
Here is the trait
:
trait Histogram {
val rawdf: DataFrame
val sparseDim: Seq[String]
val name: String
val xColumn: String
val yColumn: String
val group: DataFrame
val plot: ExtendedUnitSpecBuilder = Vegas(name).
withDataFrame(group).
encodeX(
field = xColumn,
Quantitative,
scale = Scale(ScaleType.Log),
title = sparseDim.reduce((a, b) => a + ", " + b)
).
encodeY(field = yColumn, Quantitative).
mark(Bar)
def show(): Unit = plot.show
}
And here is one of the classes extending it :
class HistogramCount(val rawdf: DataFrame,
val sparseDim: Seq[String],
val name: String = "Histogram Count") extends Histogram {
val xColumn = "cube"
val yColumn = "count"
override val group: DataFrame = rawdf.
select("VALUE", sparseDim: _*).
groupBy(sparseDim.head, sparseDim.tail: _*).
count().
withColumnRenamed("count", "cube").
groupBy("cube").
count()
}
When i create an instance of the child class, the following error occures :
Exception in thread "main" java.lang.NullPointerException
at <Pointing to .withDataFrame(group) in the trait>
I guess this is because the evaluation of group
is lazy and that it is called in .withDataFrame(group)
when plot
is created.
I tried to evaluate the group
DataFrame before Calling plot with a val evaluate: Long = group.rdd.count()
, but it does not solve the issue.