Questions tagged [spark-hive]

Used when using spark-hive module or HiveContext

Apache Spark Hive is a module for for "Hive and structured data processing" on Spark, a fast and general-purpose cluster computing system. It is the super set of Spark SQL and is used to create HiveContext, similar to SqlContext.

76 questions
35
votes
2 answers

Querying on multiple Hive stores using Apache Spark

I have a spark application which will successfully connect to hive and query on hive tables using spark engine. To build this, I just added hive-site.xml to classpath of the application and spark will read the hive-site.xml to connect to its…
karthik manchala
  • 13,492
  • 1
  • 31
  • 55
11
votes
3 answers

Apache spark Hive, executable JAR with maven shade

I'm building apache-spark application with Apache Spark Hive. So far everything was ok - I've been running tests and whole application in Intellij IDEA and all tests together using maven. Now I want to run whole application from bash and let it run…
10
votes
4 answers

Missing hive-site when using spark-submit YARN cluster mode

Using HDP 2.5.3 and I've been trying to debug some YARN container classpath issues. Since HDP includes both Spark 1.6 and 2.0.0, there have been some conflicting versions Users I support are successfully able to use Spark2 with Hive queries in YARN…
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
10
votes
4 answers

How to set hive.metastore.warehouse.dir in HiveContext?

I'm trying to write a unit test case that relies on DataFrame.saveAsTable() (since it is backed by a file system). I point the hive warehouse parameter to a local disk location: sql.sql(s"SET…
tribbloid
  • 4,026
  • 14
  • 64
  • 103
5
votes
2 answers

Spark hive udf: no handler for UDAF analysis exception

Created one project 'spark-udf' & written hive udf as below: package com.spark.udf import org.apache.hadoop.hive.ql.exec.UDF class UpperCase extends UDF with Serializable { def evaluate(input: String): String = { input.toUpperCase } Built…
Swapnil Chougule
  • 717
  • 9
  • 17
5
votes
2 answers

How can I update/delete data in Spark-hive?

I don't think my title can explain the problem so here is the problem: Details build.sbt: name := "Hello" scalaVersion := "2.11.8" version := "1.0" libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" libraryDependencies +=…
yashpal bharadwaj
  • 323
  • 2
  • 6
  • 14
5
votes
2 answers

Select all except particular column in spark sql

I want to select all columns in a table except StudentAddress and hence I wrote following query: select `(StudentAddress)?+.+` from student; It gives following error in Squirrel Sql client. org.apache.spark.sql.AnalysisException: cannot resolve…
Patel
  • 129
  • 1
  • 1
  • 11
4
votes
1 answer

sparkpy insists that root scratch dir: /tmp/hive on HDFS should be writable

I am trying to run a pyspark program that access the hive server. The program terminates by throwing the error pyspark.sql.utils.AnalysisException: 'java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS…
Realdeo
  • 449
  • 6
  • 19
4
votes
1 answer

Relative path in absolute URI Exception while accessing DynamoDB via Glue Data Catalogue in PySpark running on EMR

I am executing a pyspark application on AWS EMR that is configured to use AWS Glue Data Catalog as metastore. I have a table setup in AWS Glue that points to DynamoDB table. And now in my pyspark script, I am trying to access the Glue table. I am…
3
votes
0 answers

How to read sql files in pyspark?

I've been trying to run this code expecting it to create a table from a sql file which contains the tables schema and the values using pyspark. couldn't seem to understand the error. Please help me. --------------------SQL…
Vishal Ch
  • 59
  • 2
  • 4
3
votes
2 answers

Spark sql saveAsTable create table append mode if new column is added in avro schema

I am using Spark sql DataSet to write data into hive. Its working perfectly if schema is same but if I change the avro schema, adding new column in between, its showing the error (Schema is provided from schema registry) Error running job streaming…
Sumit G
  • 436
  • 8
  • 21
3
votes
0 answers

Spark build failed

I have downloaded the spark source from apache site, then I built the source using maven. spark - version 1.6.3 hadoop - version 2.7.3 scala - version 2.10.4 I have used below command for build the project .build/mvn -Pyarn -Phadoop-2.7…
lucy
  • 4,136
  • 5
  • 30
  • 47
3
votes
1 answer

HiveContext createDataFrame not working on pySpark (jupyter)

I am doing an analysis on pySpark using the Jupyter notebooks. My code originally build dataframes using sqlContext = SQLContext(sc), but now I've switched to HiveContext since I will be using window functions. My problem is that now I'm getting a…
masta-g3
  • 1,202
  • 4
  • 17
  • 27
3
votes
1 answer

Spark CSV IOException Mkdirs failed to create file

TL;DR Spark 1.6.1 fails to write a CSV file using Spark CSV 1.4 on a standalone cluster with no HDFS with IOException Mkdirs failed to create file More details: I'm working on a Spark 1.6.1 application running it on a standalone cluster using a…
Gideon
  • 2,211
  • 5
  • 29
  • 47
3
votes
0 answers

Field delimiter of Hive table not recognized by spark HiveContext

I have created a hive external table stored as textfile partitioned by event_date Date. How do we have to specify a specific format of csv while reading in spark from Hive table ? The environment is 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version…
1
2 3 4 5 6