0

I am reading file with CSV file with Spark SQL Context.

Code :

m.put("path", CSV_DIRECTORY+file.getOriginalFilename());
m.put("inferSchema", "true"); // Automatically infer data types else string by default
m.put("header", "true");      // Use first line of all files as header         
m.put("delimiter", ";");

DataFrame df = sqlContext.load("com.databricks.spark.csv",m);              
df.printSchema();

Fetching Column names and Data type with df.printSchema()

O/P :

|--id : integer (nullable = true)
|-- ApplicationNo: string (nullable = true)
|-- Applidate: timestamp(nullable = true)

What is the return type of the statement printSchema. How to convert the output in JSON format, How to convert data frame into JSON??

Desired O/P:

{"column":"id","datatype":"integer"}
Devz
  • 5
  • 1
  • 2
  • 9

2 Answers2

2

DataType has a json() method and a fromJson() method which you can use to serialize/deserialize schemas.

val df = sqlContext.read().....load()
val jsonString:String = df.schema.json()
val schema:StructType = DataType.fromJson(jsonString).asInstanceOf[StructType]
Hamel Kothari
  • 717
  • 4
  • 11
0

Spark SQL way,

df.createOrReplaceTempView("<table_name>")
spark.sql("SELECT COLLECT_SET(STRUCT(<field_name>)) AS `` FROM <table_name> LIMIT 1").coalesce(1).write.format("org.apache.spark.sql.json").mode("overwrite").save(<Blob Path1/ ADLS Path1>)

Output will be like,

{"":[{<field_name>:<field_value1>},{<field_name>:<field_value2>}]}

Here the header can be avoided by following 3 lines (Assumption No Tilda in data),

val jsonToCsvDF=spark.read.format("com.databricks.spark.csv").option("delimiter", "~").load(<Blob Path1/ ADLS Path1>)

jsonToCsvDF.createOrReplaceTempView("json_to_csv")

spark.sql("SELECT SUBSTR(`_c0`,5,length(`_c0`)-5) FROM json_to_csv").coalesce(1).write.option("header",false).mode("overwrite").text(<Blob Path2/ ADLS Path2>)
Nimantha
  • 6,405
  • 6
  • 28
  • 69