Questions tagged [spark2]
11 questions
2
votes
0 answers
Unable to connect hivellap from pyspark
I am using pyspark kernel inside jupyterhub and want to connect hivellap from spark . I am able to create a spark session but when i am trying to execute
from pyspark_llap import HiveWarehouseSession it is showing error no module found…

Ashish Mathur
- 33
- 4
2
votes
0 answers
Not a version: 9 exception with Scala 2.11.12
The Scala application with Scala 2.11.12 is throwing following error while executing certain set of code
The environment configurations are as follow:
Scala IDE with Eclipse: version 4.7
Eclipse Version: 2019-06 (4.12.0)
Spark Version: 2.4.4
Java…

Sandeep Singh
- 7,790
- 4
- 43
- 68
1
vote
0 answers
hive-warehouse-connector_2.11 + Required field 'client_protocol' is unset
I am using a hadoop cluster with cloudera-6.3.2 distribution.
I have a requirement to read hive ACID table from spark (Java client). So native spark does not read hive acid table. Hence planning to use Hive WareHouse Connector. But getting below…

Prasath Rajan
- 63
- 7
0
votes
0 answers
Segmentation fault error while running pyspark in Apache Spark 2.4.7
Getting Segmentation fault error while running /opt/spark2/bin/pyspark --master yarn --conf spark.ui.port=0 in Kali Linux.
I verified Python3.7 is under /usr/bin and spark can access it. While running /opt/spark2/bin/spark-shell --master yarn --conf…

Sourav Bhaumik
- 3
- 2
0
votes
0 answers
Pyspark - Spark2 SQL submit - Casting implicity all columns getting 'u'Cannot up cast 'error
I am trying to use spark sql from spark 2 on cloudera environment and getting the folowing error:
'pyspark.sql.utils.AnalysisException: u'Cannot up cast
other_column_from_table from decimal(32,22) to decimal(30,22) as it
may truncate\n;''
We not…
0
votes
2 answers
How to split RDD rows by commas when there is no value between them?
I'm trying to split the below RDD row into five columns
val test = [hello,one,,,]
val rddTest = test.rdd
val Content = rddTest.map(_.toString().replace("[", "").replace("]", ""))
.map(_.split(","))
.map(e ⇒ Row(e(0), e(1), e(2), e(3),…

murali krishna
- 21
- 6
0
votes
1 answer
How to get number of rows written in spark 2.3 using JAVA?
I know we can use the use count(). But I'm trying to capture the count using sparkListener. But I'm failing to write a proper java code for the same. I've tried following the exact approach given in this How to implement custom job listener/tracker…

Impromptu_Coder
- 425
- 3
- 7
- 27
0
votes
0 answers
i am getting error while using window functions in pyspark
i am trying to run the below code
employees = (spark.read.format('csv')
.option('sep', '\t')
.schema('''EMP_ID INT,F_NAME STRING,L_NAME STRING,
EMAIL STRING,PHONE_NR STRING,HIRE_DATE STRING,
…
0
votes
1 answer
pyspark and python not installed as part of HDP 2.6.0.3-8 stack
I have a HDP cluster where 2.6.0.3 is installed. In one of the gateway node which is not attached to the Ambari, I installed hdp stack. with the installation I got the spark2 installed. that is all fine so far. But when I looked into it, I did'nt…

Amar
- 1
0
votes
0 answers
Copy Files from AWS S3 to HDFS (Hadoop Distributed File System)
I'm trying to copy AVRO files from AWS S3 bucket to HDFS using the following Scala code:
val avroDF =…

Swathi
- 151
- 17
-1
votes
1 answer
spark2 sql deeply nested array structure with parquet
Given a deeply nested parquet struct like so
|-- bet: struct (nullable = true)
| |-- sides: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- side: string (nullable = true)
| | | |-- betID:…

sunny
- 824
- 1
- 14
- 36