3

I am trying to read a Parquet file from Azure Data Lake using the following Pyspark code.

df= sqlContext.read.format("parquet")
   .option("header", "true")
   .option("inferSchema", "true")
   .load("adl://xyz/abc.parquet")
df = df['Id','IsDeleted']

Now I would like to load this dataframe df as a table in sql dataware house using the following code:

df.write \
  .format("com.databricks.spark.sqldw") \
  .mode('overwrite') \
  .option("url", sqlDwUrlSmall) \
  .option("forward_spark_azure_storage_credentials", "true") \
  .option("dbtable", "test111") \
  .option("tempdir", tempDir) \
  .save()

This creates a table dbo.test111 in the SQL Datawarehouse with datatypes:

  • Id(nvarchar(256),null)
  • IsDeleted(bit,null)

But I need these columns with different datatypes say char(255), varchar(128) in SQL Datawarehouse. How do I do this while loading the dataframe into SQL Dataware house?

sri sivani charan
  • 399
  • 1
  • 6
  • 21

2 Answers2

0

I found a way can help you modify the column data type, but maybe could not achieve your want.

df.select(col("colname").cast(DataType))

Here is a blob about How to change column types in Spark SQL's DataFrame.

Maybe this can helps you.

Leon Yue
  • 15,693
  • 1
  • 11
  • 23
0

The only supported data types on Spark SQL are given [https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/types/package-summary.html][1]

String types will in fact be turned into VARCHAR with unspecified length. Spark SQL does not have VARCHAR(n) data type.

You should be able to do something like below

import org.apache.spark.sql.types._

val df = 
          df.withColumn("Id_mod", df.Id.cast(StringType)) 
            .withColumn("IsDeleted_mod", df.IsDeleted.cast(StringType))
            .drop("Id")
            .drop("IsDeleted")
            .withColumnRenamed("Id_mod", "Id")
            .withColumnRenamed("IsDeleted_mod", "IsDeleted")
            //Replace StringType with Any supported desired type
Manas
  • 519
  • 4
  • 14
  • 1
    I was able do it with .format("jdbc")\. option("createTableColumnTypes", "Id varchar(64)"). But the issue is that I need a column with nvarchar data type in SQL DW not varchar() which cannot be done. Looks like it supports only few data types. – sri sivani charan Jan 16 '19 at 19:34