14

I wonder can I use the update query in sparksql just like:

sqlContext.sql("update users set name = '*' where name is null")

I got the error:

org.apache.spark.sql.AnalysisException: 
Unsupported language features in query:update users set name = '*' where name is null

If the sparksql does not support the update query or am i writing the code incorrectly?

Anantha Raju C
  • 1,780
  • 12
  • 25
  • 35
ZMath_lin
  • 523
  • 2
  • 6
  • 14

3 Answers3

18

Spark SQL doesn't support UPDATE statements yet.

Hive has started supporting UPDATE since hive version 0.14. But even with Hive, it supports updates/deletes only on those tables that support transactions, it is mentioned in the hive documentation.

See the answers in databricks forums confirming that UPDATES/DELETES are not supported in Spark SQL as it doesn't support transactions. If we think, supporting random updates is very complex with most of the storage formats in big data. It requires scanning huge files, updating specific records and rewriting potentially TBs of data. It is not normal SQL.

Pranav Shukla
  • 2,206
  • 2
  • 17
  • 20
  • 2
    There is an open ticket in the Spark project to improve support for Hive transaction tables (i.e. Hive tables which support updates) https://issues.apache.org/jira/browse/SPARK-15348 – sversch Apr 11 '18 at 05:53
  • Any new updates on this whether it is supported now ? or any alternative to do the same ? –  Sep 06 '18 at 09:07
4

Now it's possible, with Databricks Delta Lake

Farvardin
  • 5,336
  • 5
  • 33
  • 54
Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
2

Spark SQL now supports update, delete and such data modification operations if the underlying table is in delta format.

Check this out: https://docs.delta.io/0.4.0/delta-update.html#update-a-table

Anjana K V
  • 51
  • 4