I have a table named "mytable" in Postgres with two columns, id (bigint) and value (varchar(255)).
id gets its value from a sequence using nextval('my_sequence')
.
A PySpark application takes a dataframe and uses the Postgres JDBC jar (postgresql-42.1.4.jar) to insert the dataframe into "mytable". I'm creating the id column using:
df.withColumn('id', lit("nextval('my_sequence')"))
Postgres is interpreting the column as a 'varying character'.
I can see that there are ways for calling Postgres methods when reading data (How to remotely execute a Postgres SQL function on Postgres using PySpark JDBC connector?), but I'm not sure how to call a Postgres function like nextval()
for writing data to Postgres.
Here's how I am currently writing the data from Pyspark to Postgres:
df.write.format("jdbc") \
.option("url", jdbc_url) \
.option("dbtable", 'mytable') \
.mode('append') \
.save()
How can one write to a Postgres table using PySpark when one column needs a sequence number using nextval()
?