Spark shell : How to copy multiline inside?

Question

I have a Scala program that I want to execute using Spark shell, now when I copy paste into spark shell it doesn't work, I have to copy line by line inside.

How should I copy all the program inside the shell ?

Thanks.

score 26 · Answer 1 · answered Sep 19 '19 at 17:40

In spark-shell, you just need use the command ":paste"

scala> :paste
// Entering paste mode (ctrl-D to finish)

val empsalary = Seq(
  Salary("sales", 1, 5000),
  Salary("personnel", 2, 3900),
  Salary("sales", 3, 4800),
  Salary("sales", 4, 4800),
  Salary("personnel", 5, 3500),
  Salary("develop", 7, 4200),
  Salary("develop", 8, 6000),
  Salary("develop", 9, 4500),
  Salary("develop", 10, 5200),
  Salary("develop", 11, 5200))
.toDS.toDF

Then use ctrl-D to quit this mode. You can see output:

// Exiting paste mode, now interpreting.

empsalary: org.apache.spark.sql.DataFrame = [depName: string, empNo: bigint ... 1 more field]

this should be the accepted answer as it's simpler and it works. — Vichoko, Jun 04 '21 at 21:49

Craig Covey · Answer 2 · 2019-12-22T15:55:29.103

In the Spark shell you can wrap your multiple line Spark code in parenthesis to execute the code. Wrapping in parenthesis will allow you to copy multiple line Spark code into the shell or write multiple line code line-by-line. See the examples below for usage.

scala> val adult_cat_df = (spark.read.format("csv")
 |   .option("sep", ",")
 |   .option("inferSchema", "true")
 |   .option("header", "false")
 |   .load("hdfs://…/adult/adult_data.csv")
 |   .toDF("age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship", "race", "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country", "class")
 |   .drop("fnlwgt", "education-num", "capital-gain", "capital-loss")
 | )
scala> val clean_df = (adult_cat_df.dropDuplicates
 |   .na.replace("*", Map("?" -> null))
 |   .na.drop(minNonNulls = 9)
 | )

score 6 · Answer 3 · answered Sep 19 '19 at 13:23

I would need more explanation from you. But I guess you are trying to do something like that :

spark.read.parquet(X)
.filter("ll")
.groupBy("iii")
.agg("kkk")

And it does not work. Instead you can do :

spark.read.parquet(X).
    filter("ll").
    groupBy("iii").
    agg("kkk")

Put the dot at the end of the line.

I hope it is what you are looking for.

score 5 · Accepted Answer · answered Sep 19 '19 at 10:59

5

just save your code to text file and use :load <path_to_your_script> in spark-shell

answered Sep 19 '19 at 10:59

chlebek

2,431
1
8
20

1

I found that I needed to put my code in parenthesis like Craig said https://stackoverflow.com/a/59440041/524588 even it's in a file. – Sanghyun Lee Jan 13 '21 at 10:09

Spark shell : How to copy multiline inside?

4 Answers4