Questions tagged [spark-shell]

More information can be found in the official documentation.

135 questions
12
votes
2 answers

Ignoring non-spark config property: hive.exec.dynamic.partition.mode

How to run a Spark-shell with hive.exec.dynamic.partition.mode=nonstrict? I try (as suggested here) export SPARK_MAJOR_VERSION=2; spark-shell --conf "hive.exec.dynamic.partition.mode=nonstrict" --properties-file…
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
11
votes
4 answers

Spark shell : How to copy multiline inside?

I have a Scala program that I want to execute using Spark shell, now when I copy paste into spark shell it doesn't work, I have to copy line by line inside. How should I copy all the program inside the shell ? Thanks.
hawarden_
  • 1,904
  • 5
  • 28
  • 48
11
votes
2 answers

Project_Bank.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [110, 111, 13, 10]

So i was trying to load the csv file inferring custom schema but everytime i end up with the following errors: Project_Bank.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [110, 111, 13, 10] This is how my program…
amitk
  • 143
  • 1
  • 1
  • 9
7
votes
2 answers

Is it possible to run a Spark Scala script without going inside spark-shell?

The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file…
MetallicPriest
  • 29,191
  • 52
  • 200
  • 356
6
votes
1 answer

spark-shell - How to avoid suppressing of elided stack trace (Exceptions)

I am trying to run on of my scala file from spark-shell. This file calling some other jar files which have already been loaded into spark-context The problem is if something fails, it only prints the part of the stacktrace. Is there any way I can…
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137
6
votes
1 answer

Graphx : Is it possible to execute a program on each vertex without receiving a message?

When I was trying to implement an algorithm in Graphx with Scala, I didn't find it possible to activate all the vertices in the next ietration.. How can I send a message to all my graph vertices? In my algorithm, there is some super-steps that…
PhiloJunkie
  • 1,111
  • 4
  • 13
  • 27
5
votes
1 answer

How to determine the best setting for spark running on a single node?

I have 55 GB data that needs to be processed. I'm running Spark-shell on a single machine with 32 cores and 180GB RAM (No cluster). Since it's a single node both- Driver and Workers reside in the same JVM process and by default use 514 MB. I set…
Neo
  • 676
  • 1
  • 4
  • 12
5
votes
3 answers

Execute the scala script through spark-shell in silent mode

Need to execute the scala script through spark-shell with silent mode. When I am using spark-shell -i "file.scala", after the execution, I am getting into the scala interactive mode. I don't want to get into there. I have tried to execute the…
Renganathan
  • 69
  • 1
  • 5
5
votes
1 answer

convert scientific notation in string format to numeric in spark dataframe

Day_Date,timeofday_desc,Timeofday_hour,Timeofday_minute,Timeofday_second,value 2017-12-18,12:21:02 AM,0,21,2,“1.779209040E+08” 2017-12-19,12:21:02 AM,0,21,2,“1.779209040E+08” 2017-12-20,12:30:52 AM,0,30,52,“1.779209040E+08” 2017-12-21,12:30:52…
Nihal
  • 5,262
  • 7
  • 23
  • 41
4
votes
1 answer

Run spark-shell from sbt

The default way of getting spark shell seems to be to download the distribution from the website. Yet, this spark issue mentions that it can be installed via sbt. I could not find documentation on this. In a sbt project that uses spark-sql and…
serv-inc
  • 35,772
  • 9
  • 166
  • 188
3
votes
3 answers

Setting up Spark-shell in Git Bash on windows

I have not faced this problem with any of other software on mysystem. Able to install and run everything in window terminal/command prompt and Git-Bash Recently, I started learning Spark. Installed Spark setting everything JAVA_HOME, SCALA_HOME,…
BeginnerRP
  • 61
  • 1
  • 5
3
votes
1 answer

Is there a way to parallelize spark.read.load(string*) when reading many files?

I noticed that in spark-shell (spark 2.4.4), when I do a simple spark.read.format(xyz).load("a","b","c",...), it looks like spark uses a single ipc client (or "thread") to load the files a, b, c, ... sequentially (they are path to hdfs). Is this…
kcode2019
  • 119
  • 1
  • 7
3
votes
0 answers

Possible reasons that spark waits and does not schedule tasks to run?

This might be a very generic question but hope someone can point some hint. But I found that sometimes, my job spark seems to hit a "pause" many times: The natural of the job is: read orc files (from a hive table), filter by certain columns, no…
kcode2019
  • 119
  • 1
  • 7
3
votes
0 answers

How to solve SQL Exception-Unsupported type JAVA_OBJECT while connecting to Presto with Apache Spark?

I am very new to Apache Spark and trying to connect to Presto from Apache Spark. Below is my Connection String, which giving the error. val jdbcDF = spark.read.format("jdbc").options(Map("url" ->…
jkat
  • 49
  • 1
  • 4
3
votes
1 answer

Apache Spark method not found sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;

I encounter this problem while running an automated data processing script in spark-shell. First couple of iterations work fine, but it always sooner or later bumps into this error. I googled this issue but haven't found an exact match. Other…
1
2 3
8 9