"INSERT INTO ..." with SparkSQL HiveContext

Question

I'm trying to run an insert statement with my HiveContext, like this:

hiveContext.sql('insert into my_table (id, score) values (1, 10)')

The 1.5.2 Spark SQL Documentation doesn't explicitly state whether this is supported or not, although it does support "dynamic partition insertion".

This leads to a stack trace like

AnalysisException: 
Unsupported language features in query: insert into my_table (id, score) values (1, 10)
TOK_QUERY 0, 0,20, 0
  TOK_FROM 0, -1,20, 0
    TOK_VIRTUAL_TABLE 0, -1,20, 0
      TOK_VIRTUAL_TABREF 0, -1,-1, 0
        TOK_ANONYMOUS 0, -1,-1, 0
      TOK_VALUES_TABLE 1, 13,20, 41
        TOK_VALUE_ROW 1, 15,20, 41
          1 1, 16,16, 41
          10 1, 19,19, 44
  TOK_INSERT 1, 0,-1, 12
    TOK_INSERT_INTO 1, 0,11, 12
      TOK_TAB 1, 4,4, 12
        TOK_TABNAME 1, 4,4, 12
          my_table 1, 4,4, 12
      TOK_TABCOLNAME 1, 7,10, 22
        id 1, 7,7, 22
        score 1, 10,10, 26
    TOK_SELECT 0, -1,-1, 0
      TOK_SELEXPR 0, -1,-1, 0
        TOK_ALLCOLREF 0, -1,-1, 0

scala.NotImplementedError: No parse rules for:
 TOK_VIRTUAL_TABLE 0, -1,20, 0
  TOK_VIRTUAL_TABREF 0, -1,-1, 0
    TOK_ANONYMOUS 0, -1,-1, 0
  TOK_VALUES_TABLE 1, 13,20, 41
    TOK_VALUE_ROW 1, 15,20, 41
      1 1, 16,16, 41
      10 1, 19,19, 44

Is there any other way to insert to a Hive table that is supported?

score 28 · Accepted Answer · edited Jun 14 '16 at 16:09

28

Data can be appended to a Hive table using the append mode on the DataFrameWriter.

data = hc.sql("select 1 as id, 10 as score")
data.write.mode("append").saveAsTable("my_table")

This gives the same result as an insert.

edited Jun 14 '16 at 16:09

Havnar

2,558
7
33
62

answered Nov 25 '15 at 18:48

Kirk Broadhurst

27,836
16
104
169

I want to write it to an existing table. How can i do that? I am using Spark 1.1.0 , which doesnt have write method . what can i do in that case? – Aviral Kumar Feb 18 '16 at 14:40
Would you care to accept the answer please so we can close this question ? :) – eliasah Jun 08 '16 at 06:33
@eliasah I have done an append in the method you have specified, but when I do a select * on the hive table I am getting the appended rows on the top, instead of them being at the bottom – User12345 Mar 01 '17 at 19:57
@Virureddy I'm not sure that your question was intended to me. – eliasah Mar 01 '17 at 20:02
3

This answer does not work in my case. I get an error: `AnalysisException: 'Saving data in the Hive serde table \`mytable\` is not supported yet. Please use the insertInto() API as an alternative..;'` – abeboparebop Apr 26 '17 at 11:51
Just adding a point. I was able to insert into hive table using spark version 2.3. Just have to exclude column list from insert statement. see link https://stackoverflow.com/questions/53812634/spark-sql-issue-with-columns-specified/53845640#53845640 – vikrant rana Dec 19 '18 at 06:34

score 15 · Answer 2 · answered Nov 27 '15 at 15:51

15

I've had the same problem (Spark 1.5.1), and tried different versions.

Given

sqlContext.sql("create table my_table(id int, score int)")

The only versions that worked looked like this:

sqlContext.sql("insert into table my_table select t.* from (select 1, 10) t")
sqlContext.sql("insert into       my_table select t.* from (select 2, 20) t")

answered Nov 27 '15 at 15:51

Beryllium

12,808
10
56
86

2

How do we add variables in the query? – yAsH May 26 '16 at 18:56
Also a viable solution to work in case you have a source table with same key as destination table.. Works in spark 1.5 – Anup Ash Aug 17 '16 at 15:45
1

@yAsH use String formatting in Pyspark to use variable i mean like in above exmaple sqlContext.sql("insert into my_table select t.* from (select 2, {variable}) t".format(variable = 20)) ### here in place of 20 you can any variable – anand Dec 18 '18 at 11:22

score 11 · Answer 3 · answered Apr 26 '17 at 12:03

11

The accepted answer saveAsTable fails for me with an AnalysisException (I don't understand why). What works for me instead is:

data = hc.sql("select 1 as id, 10 as score")
data.write.mode("append").insertInto("my_table")

I'm using Spark v2.1.0.

answered Apr 26 '17 at 12:03

abeboparebop

7,396
6
37
46

score 1 · Answer 4 · answered Oct 20 '17 at 19:34

You tried to perform something that the data file format cannot, hence the Unsupported language features in query exception.

Many data file format are write-once and no support ACID operation.

Apache ORC supports ACID operation if you need it.

Instead, you can use partition to split your data into folders (/data/year=2017/month=10....), here you can append/insert data into your data lake.

score -1 · Answer 5 · answered Aug 23 '16 at 12:50

-1

try this hiveContext.sql("insert into table my_table select 1, 10") if you haven't change your dynamic partition mode to nonstrict, you have to do this hiveCtx.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

answered Aug 23 '16 at 12:50

user3752500

57
2
10

5

Why should the OP "try this code"? A **good answer** will always have an explanation of what was done and why it was done in such a manner, not only for the OP but for future visitors to SO. – B001ᛦ Aug 23 '16 at 13:11
1

Additionally, it does not work ... see Berylium's answer below, it works – Mar 13 '17 at 10:46

mahatmawx · Answer 6 · 2018-01-04T06:35:47.353

-2

When you first time do this

$data.write.mode("append").saveAsTable("my_table")

you should replace "append" with "overwrite", Then, you can use "append".

edited Jan 04 '18 at 06:35

answered Jan 04 '18 at 06:30

mahatmawx

9
1

"INSERT INTO ..." with SparkSQL HiveContext

6 Answers6

Linked