how to use the "display" function in a scala 2.11 with Spark 2.0 notebook in dsx

Question

In dsx is there a way to use "display" in a scala 2.11 with Spark 2.0 notebook (I know it can be done in a python notebook with pixiedust). Eg:

display(spark.sql("SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table 
                   WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC"))

But I want to do the same in a scala notebook. Currently I am just doing a show command below that just give data in a tabular format with no graphics etc.

spark.sql("SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table 
          WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC").show()

@close_voters - I have updated the question to help it satisfy the stackoverflow requirements. — Chris Snow, Jan 27 '17 at 18:33

score 3 · Answer 1 · edited Jan 31 '17 at 15:04

Note:

Pixiedust currently works with Spark 1.6 and Python 2.7.
Pixiedust currently supports Spark DataFrames, Spark GraphFrames and Pandas

Reference:- https://github.com/ibm-cds-labs/pixiedust/wiki

But if you can use Spark 1.6 ,here is a quick way around to use that fancy display function:-

You can go the other way around, Since Pixidust let you use scala and python in one python notebook with %%scala line magic.

https://github.com/ibm-cds-labs/pixiedust/wiki/Using-Scala-language-within-a-Python-Notebook

Step 1. Create a notebook with python 2 and spark 1.6 Install pixidust and import it

!pip install --user --no-deps --upgrade pixiedust
import pixiedust

Define your variables or your dataframe in Scala under

%%scala
import org.apache.spark.sql._

print(sc.version)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val __df = sqlContext.read.json("people.json")

__df.show()

or

do whatever to create your dataframe

val __df = dataframe1.sql("SELECT COUNT(zip), SUM(pop), city FROM hive_zips_table 
      WHERE state = 'CA' GROUP BY city ORDER BY SUM(pop) DESC").show()

Step 2: In separate cell run following to access df variable in your python shell.

display(__df)

Reference to my sample Notebook:-

Thanks, Charles.

Should @Vik M raise a uservoice ticket or a github issue to request support for spark 2.0 and scala 2.11? — Chris Snow, Feb 04 '17 at 11:04
Raise github issue since it will get direct attention of pixiedust devs. I am sure spark 2 implementation should be on its way. — charles gomes, Feb 04 '17 at 20:59
@Vik M I think you can accept Charles' answer and raise a new issue on github? — Chris Snow, Feb 04 '17 at 21:11
Could you please include the necessary import(s)? import org.apache.spark.sql._ does not seem to suffice. — Christian Neverdal, Nov 17 '17 at 17:12
@ChristianNeverdal are you running into error? Which other import you are referring to? — charles gomes, Nov 18 '17 at 18:02

score 1 · Answer 2 · answered Mar 28 '18 at 13:05

1

You can get similar result in Zeppelin

z.show(dataframe)

answered Mar 28 '18 at 13:05

Erkan Şirin

1,935
18
28

how to use the "display" function in a scala 2.11 with Spark 2.0 notebook in dsx

2 Answers2