I have an use case where I want to update specific row, by any identifier/where clause conditions and update that record on Oracle or SQL Server from databricks.
As i use spark.read.format("jdbc") against any of the databases, I could not easily find a way to update specific rows back to these DBs.
If i use,
df.write.format("jdbc")
.option("url", dbServerJdbcUrl)
.option("user", username)
.option("secret", password)
.option("driver", <either com.microsoft.sqlserver.jdbc.SQLServerDriver or oracle.jdbc.driver.OracleDriver>)
.option("dbTable",<table on the database platform>)
.mode('overwrite') //or other options
.save()
it only overwrites the whole "dbTable" on the database. I could not find a way to have it work by using .option("query", "update statements")
so far.
If i tend to write to another temp or parking table, then it becomes 2 stages of work, wherein, i have to go back to the Db platform and have the actual respective tables updated from the parking table.
Another note - when i do the above write, on a table which has millions of rows, and i only want to update handful of them, any of the modes are only just causing more trouble.
- overwrite - simply makes the millions of rows to lose/overwritten by this handful of data from df.
- append - either creates dupes or eventually failure due to constraints
Is there any better solution to have the databricks update the specific rows on a database?