0

I want to add a column to a spark dataframe which has been registered as a table. This column needs to have an auto incrementing long.

df = spark.sql(query)
df.createOrReplaceTempView("user_stories")
df = spark.sql("ALTER TABLE user_stories ADD COLUMN rank int AUTO_INCREMENT")
df.show(5)

This throws the following error,

Py4JJavaError: An error occurred while calling o72.sql.
: org.apache.spark.sql.catalyst.parser.ParseException: 
no viable alternative at input 'ALTER TABLE user_stories ADD COLUMN'(line 1, pos 29)

== SQL ==
ALTER TABLE user_stories ADD COLUMN rank int AUTO_INCREMENT
-----------------------------^^^

What am I missing here?

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
Melissa Stewart
  • 3,483
  • 11
  • 49
  • 88

1 Answers1

1

if you want to add new incremental column to DF, you could do in following ways.

df.show()
+-------+
|   name|
+-------+
|gaurnag|
+-------+   
from pyspark.sql.functions import monotonically_increasing_id
new_df = df.withColumn("id", monotonically_increasing_id())
new_df.show()
+-------+---+
|   name| id|
+-------+---+
|gaurnag|  0|
+-------+---+
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137