Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
224
votes
21 answers

How to stop INFO messages displaying on spark console?

I'd like to stop various messages that are coming on spark shell. I tried to edit the log4j.properties file in order to stop these message. Here are the contents of log4j.properties # Define the root logger with appender…
Vishwas
  • 6,967
  • 5
  • 42
  • 69
209
votes
7 answers

Add JAR files to a Spark job - spark-submit

True... it has been discussed quite a lot. However, there is a lot of ambiguity and some of the answers provided ... including duplicating JAR references in the jars/executor/driver configuration or options. The ambiguous and/or omitted details The…
YoYo
  • 9,157
  • 8
  • 57
  • 74
16
votes
2 answers

How to execute spark submit on amazon EMR from Lambda function?

I want to execute spark submit job on AWS EMR cluster based on the file upload event on S3. I am using AWS Lambda function to capture the event but I have no idea how to submit spark submit job on EMR cluster from Lambda function. Most of the…
14
votes
1 answer

Set hadoop configuration values on spark-submit command line

We want to set the aws parameters that from code would be done via the SparkContext: sc.hadoopConfiguration.set("fs.s3a.access.key", vault.user) sc.hadoopConfiguration.set("fs.s3a.secret.key", vault.key) However we have a custom Spark launcher…
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
13
votes
1 answer

spark-submit --py-files gives warning RuntimeWarning: Failed to add file speficied in 'spark.submit.pyFiles' to Python path:

We have a pyspark based application and we are doing a spark-submit as shown below. Application is working as expected, however we are seeing a weird warning message. Any way to handle this or why is this coming ? Note: The cluster is Azure HDI…
12
votes
5 answers

ClassNotFoundException scala.runtime.LambdaDeserialize when spark-submit

I follow the Scala tutorial on https://spark.apache.org/docs/2.1.0/quick-start.html My scala file /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp…
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
12
votes
2 answers

Spark Driver Memory and Executor Memory

I am beginner to Spark and I am running my application to read 14KB data from text filed, do some transformations and actions(collect, collectAsMap) and save data to Database I am running it locally in my macbook with 16G memory, with 8 logical…
nnc
  • 790
  • 2
  • 14
  • 31
10
votes
4 answers

How to save a file on the cluster

I'm connected to the cluster using ssh and I send the program to the cluster using spark-submit --master yarn myProgram.py I want to save the result in a text file and I tried using the following…
lads
  • 1,125
  • 3
  • 15
  • 29
10
votes
1 answer

my spark sql limit is very slow

I use spark to read from elasticsearch.Like select col from index limit 10; The problem is that the index is very large, it contains 100 billion rows.And spark generate thousands of tasks to finish the job. All I need is 10 rows, even 1 tasks…
10
votes
1 answer

Pass system property to spark-submit and read file from classpath or custom path

I have recently found a way to use logback instead of log4j in Apache Spark (both for local use and spark-submit). However, there is last piece missing. The issue is that Spark tries very hard not to see logback.xml settings in its classpath. I have…
Atais
  • 10,857
  • 6
  • 71
  • 111
9
votes
1 answer

How to run spark-submit remotely?

I have spark running in cluster (Remote) How do I submit application using spark-submit to remote cluster with following scenerio: spark-submit is executed as command via camel the application runs in its own container. From the following…
Pkumar
  • 157
  • 3
  • 14
9
votes
1 answer

Pyspark: Error executing Jupyter command while running a file using spark-submit

I am able to run pyspark and run a script on Jupyter notebook. But when I try to run the file from terminal using spark-submit, getting this error: Error executing Jupyter command file path [Errno 2] No such file or directory Can anyone help me…
Q Bit
  • 91
  • 1
  • 4
9
votes
2 answers

How to pass external parameters through Spark submit

In my Application, i need to connect to the database so i need to pass IP address and database name when application is submitted. I submit the application as follows: : ./spark-submit --class class name --master spark://localhost:7077…
ROOT
  • 1,757
  • 4
  • 34
  • 60
7
votes
1 answer

Spark submit to kubernetes: packages not pulled by executors

I'm trying to submit my Pyspark application to a Kubernetes cluster (Minikube) using spark-submit: ./bin/spark-submit \ --master k8s://https://192.168.64.4:8443 \ --deploy-mode cluster \ --packages…
7
votes
2 answers

java.lang.IllegalArgumentException: Too large frame: 5211883372140375593

I submitted my code to the cluster to run, but I encountered the following error. ''' java.lang.IllegalArgumentException: Too large frame: 5211883372140375593 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at…
zhicheng
  • 156
  • 1
  • 6
1
2 3
40 41