20

Given the following snippet:

val data = sc.parallelize(0 until 10000)
val local = data.collect 
println(s"local.size")

Zeppelin prints out the entire value of local to the notebook screen. How may that behavior be changed?

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560

6 Answers6

29

You can also try adding curly brackets around your code.

{val data = sc.parallelize(0 until 10000)
val local = data.collect 
println(s"local.size")}
Caner
  • 678
  • 2
  • 8
  • 11
  • Ok that's a nice solution – WestCoastProjects Jan 07 '16 at 03:40
  • 6
    Note that this solution also changes the scope of your val data and local. – Paul-Armand Verhaegen Aug 11 '16 at 14:53
  • 1
    @Paul-ArmandVerhaegen thanks for pointing that out. I think the same goes for the solution that was accepted by the OP. – Caner Aug 11 '16 at 17:04
  • 1
    @Caner Indeed, the accepted answer also changes the scope of the vars, but I commented on your solution since the scope change is more obvious in the function approach, and your answer has more votes (including mine). The reason I posted this comment, is that after implementing the proposed solutions, the new local vars shadow the global ones and that can be tricky when the spark interpreter is not restarted to clear the global vars (in Zeppelin). – Paul-Armand Verhaegen Aug 12 '16 at 08:46
13

Since 0.6.0, Zeppelin provides a boolean flag zeppelin.spark.printREPLOutput in spark's interpreter configuration (accessible via the GUI), which is set to true by default. If you set its value to false then you get the desired behaviour that only explicit print statements are output.

See also: https://issues.apache.org/jira/browse/ZEPPELIN-688

cubic lettuce
  • 6,301
  • 3
  • 18
  • 25
  • Apparently the `zeppelin` developers finally woke up to this. But I am in general v unhappy with usability of `zeppelin` and switched to `jupyter` – WestCoastProjects Feb 06 '17 at 15:31
  • 3
    For me it's exactly the other way round. I'm leaving jupyter because it separates too strongly between different kernels and between back- and frontend. Zeppelin not only allows you to mix python with scala / spark, etc., plus reactive AngularJS-elements, but also to inject via print("%html " + ...) arbitrary html/javascript (e.g., D3, Plotly, etc.) directly into the frontend. Btw: I compile 0.7.0 manually to already get some improvements. However, interactive usability is still one step behind jupyter, and jupyter-lab looks very promising for the future, too... – cubic lettuce Feb 06 '17 at 15:56
  • k Your comments about not intermixing scala / python are true for jupyter. I really appreciate the high end keyboarding and generally good user experience in jupyter: zeppelin is way behind to the point of being unusable to me. – WestCoastProjects Feb 06 '17 at 20:17
  • How can the same be achieved in Jupyter Notebook with Scala kernel? – calpyte Oct 16 '18 at 09:08
2

What I do to avoid this is define a top-level function, and then call it:

def run() : Unit = {
    val data = sc.parallelize(0 until 10000)
    val local = data.collect 
    println(local.size)
}
run();
Pradyumna
  • 1,583
  • 4
  • 19
  • 34
2

FWIW, this appears to be new behaviour. Until recently we have been using Livy 0.4, it only output the content of the final statement (rather than echoing the output of the whole script).

When we upgraded to Livy 0.5, the behaviour changed to output the entire script.

While splitting the paragraph and hiding the output does work, it seems like an unnecessary overhead to the usability of Zeppelin. for example, if you need to refresh your output, then you have to remember to run two paragraphs (i.e. the one that sets up your output and the one containing the actual println).

There are, IMHO, other usability issues with this approach that makes, again IMHO, Zeppelin less intuitive to use.

Someone has logged this JIRA ticket to address "the problem", please vote for it: LIVY-507

GMc
  • 1,764
  • 1
  • 8
  • 26
1

Zeppelin, as well as spark-shell REPL, always prints the whole interpreter output.

If you really want to have only local.size string printed - best way to do it is to put println "local.size" statement inside the separate paragraph.

Then you can hide all output of the previous paragraph using small "book" icon on the top-right.

bzz
  • 663
  • 3
  • 12
  • *"as well as spark-shell REPL"*. No - the REPL does not. in the spark-shell the statement *"val local=data.collect"* will not result in any printing statements. – WestCoastProjects Aug 28 '15 at 04:37
  • @javadba May be I misunderstood you somehow, in `./bin/spark-shell` here is the output I get `scala>val local = data.collect local: Array[Int] = Array(0, 1, 2, 3, ...` – bzz Aug 29 '15 at 15:48
0

a simple trick I am using is to define

def !() ="_ __ ___ ___________________________________________________"

and use as

$bang

above or close to the code I want to check and it works

res544: String = _ __ ___ ___________________________________________________

then I just leave there commented out ;)

// hope it helps

rio
  • 685
  • 9
  • 16