Questions tagged [spark2]

11 questions
2
votes
0 answers

Unable to connect hivellap from pyspark

I am using pyspark kernel inside jupyterhub and want to connect hivellap from spark . I am able to create a spark session but when i am trying to execute from pyspark_llap import HiveWarehouseSession it is showing error no module found…
2
votes
0 answers

Not a version: 9 exception with Scala 2.11.12

The Scala application with Scala 2.11.12 is throwing following error while executing certain set of code The environment configurations are as follow: Scala IDE with Eclipse: version 4.7 Eclipse Version: 2019-06 (4.12.0) Spark Version: 2.4.4 Java…
Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68
1
vote
0 answers

hive-warehouse-connector_2.11 + Required field 'client_protocol' is unset

I am using a hadoop cluster with cloudera-6.3.2 distribution. I have a requirement to read hive ACID table from spark (Java client). So native spark does not read hive acid table. Hence planning to use Hive WareHouse Connector. But getting below…
0
votes
0 answers

Segmentation fault error while running pyspark in Apache Spark 2.4.7

Getting Segmentation fault error while running /opt/spark2/bin/pyspark --master yarn --conf spark.ui.port=0 in Kali Linux. I verified Python3.7 is under /usr/bin and spark can access it. While running /opt/spark2/bin/spark-shell --master yarn --conf…
0
votes
0 answers

Pyspark - Spark2 SQL submit - Casting implicity all columns getting 'u'Cannot up cast 'error

I am trying to use spark sql from spark 2 on cloudera environment and getting the folowing error: 'pyspark.sql.utils.AnalysisException: u'Cannot up cast other_column_from_table from decimal(32,22) to decimal(30,22) as it may truncate\n;'' We not…
0
votes
2 answers

How to split RDD rows by commas when there is no value between them?

I'm trying to split the below RDD row into five columns val test = [hello,one,,,] val rddTest = test.rdd val Content = rddTest.map(_.toString().replace("[", "").replace("]", "")) .map(_.split(",")) .map(e ⇒ Row(e(0), e(1), e(2), e(3),…
0
votes
1 answer

How to get number of rows written in spark 2.3 using JAVA?

I know we can use the use count(). But I'm trying to capture the count using sparkListener. But I'm failing to write a proper java code for the same. I've tried following the exact approach given in this How to implement custom job listener/tracker…
Impromptu_Coder
  • 425
  • 3
  • 7
  • 27
0
votes
0 answers

i am getting error while using window functions in pyspark

i am trying to run the below code employees = (spark.read.format('csv') .option('sep', '\t') .schema('''EMP_ID INT,F_NAME STRING,L_NAME STRING, EMAIL STRING,PHONE_NR STRING,HIRE_DATE STRING, …
0
votes
1 answer

pyspark and python not installed as part of HDP 2.6.0.3-8 stack

I have a HDP cluster where 2.6.0.3 is installed. In one of the gateway node which is not attached to the Ambari, I installed hdp stack. with the installation I got the spark2 installed. that is all fine so far. But when I looked into it, I did'nt…
Amar
  • 1
0
votes
0 answers

Copy Files from AWS S3 to HDFS (Hadoop Distributed File System)

I'm trying to copy AVRO files from AWS S3 bucket to HDFS using the following Scala code: val avroDF =…
Swathi
  • 151
  • 17
-1
votes
1 answer

spark2 sql deeply nested array structure with parquet

Given a deeply nested parquet struct like so |-- bet: struct (nullable = true) | |-- sides: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- side: string (nullable = true) | | | |-- betID:…
sunny
  • 824
  • 1
  • 14
  • 36