11

I have a Scala program that I want to execute using Spark shell, now when I copy paste into spark shell it doesn't work, I have to copy line by line inside.

How should I copy all the program inside the shell ?

Thanks.

hawarden_
  • 1,904
  • 5
  • 28
  • 48

4 Answers4

26

In spark-shell, you just need use the command ":paste"

scala> :paste
// Entering paste mode (ctrl-D to finish)

val empsalary = Seq(
  Salary("sales", 1, 5000),
  Salary("personnel", 2, 3900),
  Salary("sales", 3, 4800),
  Salary("sales", 4, 4800),
  Salary("personnel", 5, 3500),
  Salary("develop", 7, 4200),
  Salary("develop", 8, 6000),
  Salary("develop", 9, 4500),
  Salary("develop", 10, 5200),
  Salary("develop", 11, 5200))
.toDS.toDF

Then use ctrl-D to quit this mode. You can see output:

// Exiting paste mode, now interpreting.

empsalary: org.apache.spark.sql.DataFrame = [depName: string, empNo: bigint ... 1 more field]
timothyzhang
  • 730
  • 9
  • 12
8

In the Spark shell you can wrap your multiple line Spark code in parenthesis to execute the code. Wrapping in parenthesis will allow you to copy multiple line Spark code into the shell or write multiple line code line-by-line. See the examples below for usage.

scala> val adult_cat_df = (spark.read.format("csv")
 |   .option("sep", ",")
 |   .option("inferSchema", "true")
 |   .option("header", "false")
 |   .load("hdfs://…/adult/adult_data.csv")
 |   .toDF("age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship", "race", "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country", "class")
 |   .drop("fnlwgt", "education-num", "capital-gain", "capital-loss")
 | )
scala> val clean_df = (adult_cat_df.dropDuplicates
 |   .na.replace("*", Map("?" -> null))
 |   .na.drop(minNonNulls = 9)
 | )
Craig Covey
  • 81
  • 1
  • 5
6

I would need more explanation from you. But I guess you are trying to do something like that :

spark.read.parquet(X)
.filter("ll")
.groupBy("iii")
.agg("kkk")

And it does not work. Instead you can do :

spark.read.parquet(X).
    filter("ll").
    groupBy("iii").
    agg("kkk")

Put the dot at the end of the line.

I hope it is what you are looking for.

Nastasia
  • 557
  • 3
  • 22
5

just save your code to text file and use :load <path_to_your_script> in spark-shell

chlebek
  • 2,431
  • 1
  • 8
  • 20
  • 1
    I found that I needed to put my code in parenthesis like Craig said https://stackoverflow.com/a/59440041/524588 even it's in a file. – Sanghyun Lee Jan 13 '21 at 10:09