2

I am writing a dataframe from Azure Databricks onto a sql dataware house with

res.write \
    .format("jdbc") \
    .option("url", url) \
    .option("dbtable", table) \
    .option("user", user) \
    .option("password", password) \
    .save()

with this I am getting an error Column 'username' has a data type that cannot participate in a columnstore index.

How can I either eliminate the column store completely, or change the data type of the columns to fit in the column store?

I have a few columns that will be considered as integer and some that will be varchar.

Dee
  • 199
  • 4
  • 17

1 Answers1

4

Add this option clause to your write statement. It takes the place of the with() clause of the CREATE TABLE (AS) statement:

.option ("tableOptions","heap,distribution=MY_DISTRIBUTION")

Reference is here:

https://docs.databricks.com/spark/latest/data-sources/azure/sql-data-warehouse.html

Choose a value for MY_DISTRIBUTION based on the following guidance:

https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute

Ron Dunn
  • 2,971
  • 20
  • 27
  • Hi thanks a lot!. I tried adding `.option ("tableOptions","heap,distribution=HASH(rowno)") \` where rowno is int. I still get the same error as earlier referencing the same column. Is this syntax incorrect? – Dee Oct 30 '18 at 03:55
  • Did you drop the table first? I think you're inserting into the previously created table. You should also be able to use the .mode (SaveMode.Overwrite) clause. – Ron Dunn Oct 30 '18 at 04:23
  • I actualy tried creating a new table with a different name. At the moment there is no table created in the database with the name. The code looks like this now _res.write \ .format("jdbc") \ .option("url", url) \ .option("dbtable", table1) \ .option ("tableOptions","heap, distribution=HASH(rowno)") \ .option("user", user) \ .option("password", password) \ .save()_ – Dee Oct 30 '18 at 04:30
  • 1
    Is there a reason you're using .format("jdbc")? It should be .format("com.databricks.spark.sqldw"). Take a close look at the examples in the Reference link that I posted above :) – Ron Dunn Oct 30 '18 at 05:08
  • hey yes that was the issue there. When i changed it, its loading data now. Thank you very much! the code finally is res.write \ .format("com.databricks.spark.sqldw") \ .option("forwardSparkAzureStorageCredentials", "True") \ .option("url", url) \ .option("dbtable", table1) \ .option("tableOptions","heap,distribution=HASH(rownum)") \ .option("tempdir", tempDir) \ .save() – Dee Oct 30 '18 at 05:44