1

I'm using Jupyter (kernal - Apache Torre) for Analytics using Apache Spark/Scala. For visualization, I'm trying to use use Vegas (github - https://github.com/vegas-viz/Vegas)

When i use the sample Vegas code - without using the Vegas Spark extension, it works fine (pls see screenshot attached)

However, with DataFrames, it does not seem to be showing the graphs. (i.e. the Graph is not showing data)

Here is the code -

%AddDeps org.vegas-viz vegas_2.11 0.3.11 --transitive

%AddDeps org.vegas-viz vegas-spark_2.11 0.3.11

import vegas._
import vegas.render.WindowRenderer._
import vegas.data.External._
import vegas.sparkExt._

val seq = Seq(("a", 16), ("b", 77), ("c", 45), ("d",101),("e", 132),("f", 166),("g", 51))
val df = seq.toDF("id", "value")

df.show()

+---+-----+
| id|value|
+---+-----+
|  a|   16|
|  b|   77|
|  c|   45|
|  d|  101|
|  e|  132|
|  f|  166|
|  g|   51|
+---+-----+

val usingSparkdf = Vegas("UsingSpark")
  .withDataFrame(df1)
  .encodeX("id")
  .encodeY("value")
  .mark(Bar)

usingSparkdf.show

Vegas-with-DF

Vegas-without-DF

What am i doing wrong here ?

Is this the correct way to include Scala extension ?

 %AddDeps org.vegas-viz vegas-spark_2.11 0.3.11
Karan Alang
  • 869
  • 2
  • 10
  • 35
  • saw you already found your problem but on top it seems like youre plotting `df1` while as written in your question you only defined `df` – dieHellste Feb 19 '19 at 13:36

2 Answers2

0

I was able to fix this issue, encodeX, encodeY should have the (statistical) number type specified i.e. Quant, Nom or Ord, along with Column name.

The code below works fine.

 val usingSparkdf = Vegas("UsingSpark")
      .withDataFrame(df1)
      .encodeX("id", Nom)
      .encodeY("value", Quant)
      .mark(Bar)

usingSparkdf.show
Karan Alang
  • 869
  • 2
  • 10
  • 35
0
package al.da.vg

object vegas_spark extends App {

  val conf = new SparkConf().setAppName("Vegas_Spark").setMaster("local[*]")
  val sc = new SparkContext(conf)
  val spark = SparkSession.builder().config(conf).appName("Vegas_Spark").getOrCreate()
  val sqlContext = new SQLContext(sc)
  import sqlContext.implicits._

  spark.sparkContext.setLogLevel("WARN")


  val seq1 = Seq(
    Map("a" -> "A", "b" -> 28), Map("a" -> "B", "b" -> 55), Map("a" -> "C", "b" -> 43),
    Map("a" -> "D", "b" -> 91), Map("a" -> "E", "b" -> 81), Map("a" -> "F", "b" -> 53),
    Map("a" -> "G", "b" -> 19), Map("a" -> "H", "b" -> 87), Map("a" -> "I", "b" -> 52))

  val df1 = seq1.toDF("a", "b")

  df1.show()

val usingSparkdf1 = Vegas("Vegas_Spark")
  .withDataFrame(df1)
  .encodeX("a", Ordinal)
  .encodeY("b", Quantitative)
  .mark(Bar)
  .show

}
avariant
  • 2,234
  • 5
  • 25
  • 33