I have a nested JSON dataframe in Spark which looks like below
root
|-- data: struct (nullable = true)
| |-- average: long (nullable = true)
| |-- sum: long (nullable = true)
| |-- time: string (nullable = true)
|-- password: string (nullable = true)
|-- url: string (nullable = true)
|-- username: string (nullable = true)
I need to convert the time variable under the data struct to timestamp data type. Following is the code I tried, but did not give me the result i wanted.
val jsonStr = """{
"url": "imap.yahoo.com",
"username": "myusername",
"password": "mypassword",
"data": {
"time":"2017-1-29 0-54-32",
"average": 234,
"sum": 123}}"""
val json: JsValue = Json.parse(jsonStr)
import sqlContext.implicits._
val rdd = sc.parallelize(jsonStr::Nil);
var df = sqlContext.read.json(rdd);
df.printSchema()
val dfRes = df.withColumn("data",makeTimeStamp(unix_timestamp(df("data.time"),"yyyy-MM-dd hh-mm-ss").cast("timestamp")))
dfRes.printSchema();
case class Convert(time: java.sql.Timestamp)
val makeTimeStamp = udf((time: java.sql.Timestamp) => Convert(
time))
Result of my code:
root
|-- data: struct (nullable = true)
| |-- time: timestamp (nullable = true)
|-- password: string (nullable = true)
|-- url: string (nullable = true)
|-- username: string (nullable = true)
My code is actually removing the other elements inside the data struct(which are average and sum) instead of just casting the time string to timestamp data type. For basic data management operations on JSON dataframes, Do we need to write UDF as and when we need a functionality or is there a library available for JSON data management. I am currently using Play framework for working with JSON objects in spark. Thanks in advance.