Questions tagged [hivecontext]

Questions related to HiveContext class of Apache Spark

A variant of Spark SQL that integrates with data stored in Hive.

Configuration for Hive is read from hive-site.xml on the classpath. It supports running both SQL and HiveQL commands.

More documentation can be found:

106 questions
24
votes
6 answers

"INSERT INTO ..." with SparkSQL HiveContext

I'm trying to run an insert statement with my HiveContext, like this: hiveContext.sql('insert into my_table (id, score) values (1, 10)') The 1.5.2 Spark SQL Documentation doesn't explicitly state whether this is supported or not, although it does…
15
votes
2 answers

External Hive Table Refresh table vs MSCK Repair

I have external hive table stored as Parquet, partitioned on a column say as_of_dt and data gets inserted via spark streaming. Now Every day new partition get added. I am doing msck repair table so that the hive metastore gets the newly added…
Ajith Kannan
  • 812
  • 1
  • 8
  • 30
6
votes
1 answer

Using TestHiveContext/HiveContext in unit tests

I'm trying to do this in unit tests: val sConf = new SparkConf() .setAppName("RandomAppName") .setMaster("local") val sc = new SparkContext(sConf) val sqlContext = new TestHiveContext(sc) // tried new HiveContext(sc) as well But I get…
Sahil Sareen
  • 1,813
  • 3
  • 25
  • 40
5
votes
2 answers

How to Updata an ORC Hive table form Spark using Scala

I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell) objHiveContext.sql("select * from table_name ") able to see data but when I…
sudhir
  • 1,387
  • 3
  • 25
  • 43
4
votes
1 answer

Will query from Spark hivecontext lock the hive table?

I know if I submit the query from Hive,a shared lock will be acquired and then the hive table will get locked by the query: https://cwiki.apache.org/confluence/display/Hive/Locking So I just wonder if the query is executed by Spark Hivecontext, will…
JerryLi
  • 151
  • 2
  • 10
3
votes
0 answers

PySpark restart SparkContext on failure

I need to compute some aggregations for each table in a Hive database. My code is something like: sc = SparkContext() sqlContext = HiveContext(sc) showtables_df = sqlContext.sql('show tables in my_db') for onlinetable in…
sergionsk8
  • 135
  • 2
  • 11
3
votes
1 answer

Repairing hive table using hiveContext in java

I want to repair the hive table for any newly added/deleted partitions.Instead of manually running msck repair command in hive,is there any way to achieve this in java?I am trying to get all partitions from hdfs and from hive metastore and then…
mahan07
  • 887
  • 4
  • 14
  • 32
3
votes
2 answers

How to pass hiveContext as argument to functions spark scala

I have created a hiveContext in main() function in Scala and I need to pass through parameters this hiveContext to other functions, this is the structure: object Project { def main(name: String): Int = { val hiveContext = new…
3
votes
1 answer

Start HiveThriftServer programmatically in Python

In the spark-shell (scala), we import, org.apache.spark.sql.hive.thriftserver._ for starting Hive Thrift server programatically for a particular hive context as HiveThriftServer2.startWithContext(hiveContext) to expose a registered temp table for…
3
votes
0 answers

Field delimiter of Hive table not recognized by spark HiveContext

I have created a hive external table stored as textfile partitioned by event_date Date. How do we have to specify a specific format of csv while reading in spark from Hive table ? The environment is 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version…
3
votes
1 answer

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/analysis/OverrideFunctionRegistry

I have tried with below code in spark and scala, attaching code and pom.xml package com.Spark.ConnectToHadoop import org.apache.spark.SparkConf import org.apache.spark.SparkConf import org.apache.spark._ import org.apache.spark.sql._ import…
sudhir
  • 1,387
  • 3
  • 25
  • 43
3
votes
3 answers

Hive Tables are created from spark but are not visible in hive

From spark using: DataFrame.write().mode(SaveMode.Ignore).format("orc").saveAsTable("myTableName") Table is getting saved I can see using below command's hadoop fs -ls /apps/hive/warehouse\test.db' where test is my database name drwxr-xr-x -…
sudhir
  • 1,387
  • 3
  • 25
  • 43
2
votes
2 answers

Spark SQL sql("").first().getDouble(0) give me inconsistent results

I have the below query which is supposed to find an average of the column values and return me the result which is a single number. val avgVal = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and…
pushpavanthar
  • 819
  • 6
  • 20
2
votes
0 answers

Query fails in HiveContext of pyspark while writing into avro format

I'm trying to load an external table as avro format, using HiveContext of pyspark. The external-table creation query runs in hive. However, the same query fails in hive context with error as, org.apache.hadoop.hive.serde2.SerDeException:…
Gdek
  • 81
  • 4
  • 11
2
votes
2 answers

HiveContext.sql("insert into")

I'm trying to insert data with HiveContext like this: /* table filedata CREATE TABLE `filedata`( `host_id` string, `reportbatch` string, `url` string, `datatype` string, `data` string, `created_at` string, `if_del`…
hyjal
  • 63
  • 8
1
2 3 4 5 6 7 8